Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Information Retrieval in Practice
Automatic Discovery of Technology Trends from Patent Text Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng School of Engineering Information.
ADVISE: Advanced Digital Video Information Segmentation Engine
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Search Engines
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Querying Structured Text in an XML Database By Xuemei Luo.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.
Digital libraries and web- based information systems Mohsen Kamyar.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Post-Ranking query suggestion by diversifying search Chao Wang.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Semantic Web 06 T 0006 YOSHIYUKI Osawa. Problem of current web  limits of search engines Most web pages are only groups of character strings. Most web.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
General Architecture of Retrieval Systems 1Adrienn Skrop.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Search Engine Architecture
Social Knowledge Mining
CSc4730/6730 Scientific Visualization
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Identify Different Chinese People with Identical Names on the Web
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies, University of Tsukuba

motivation Existing encyclopedias often lack new terms and new definitions for existing terms Web contains an enormous volume of up-to- date information is a source to obtain new term descriptions The use of existing search engine has many problems

search engine?? Often retrieve extraneous pages not describing a submitted term A user has to identify page fragments describing the term Descriptions in multiple pages are independent Word senses are not distinguished for ambiguous terms

They propose a summarization method that produces a concise and condensed term description from multiple paragraphs In this paper, they focus on Japanese technical terms in the computer domain

Overview of CYCLONE

Summarization Method Given a set of paragraph-style descriptions for a single term in a specific domain, their summarization method produces a concise text describing the term from different viewpoints 12 viewpoints in computer domain: definition, abbreviation, exemplification, purpose, synonym, reference, product, advantage, drawback, history, component, function

Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

Identification A sentence is often associated with multiple viewpoints e.g. XML is an abbreviation for eXtensible Markup Language, and is markup language Segment Japanese sentences into simple sentences, and apply zero pronoun detection and anaphora resolution can be used XML is an abbreviation for eXtensible Markup Language XML is markup language Abbreviation viewpoint definition viewpoint

Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

Classification 12 viewpoints 36 linguistic patterns are used to describe terms from a specific viewpoint Simple sentences match with patterns for multiple viewpoints is classified into viewpoint group

Classification (cont) How about those sentences do not match any patterns? Classify remaining sentences into the group where their most similar sentence is belong Compute the similarity between an unclassified sentences and each of the classified sentences (Dice coefficient) “miscellaneous” group

example

Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

Selection The number of sentences selected from each group depends on the desired size of the resultant summary Compute the score for each sentence and select sentences with greater scores in each group # of common words included (W) – sentences including frequent words are preferred Rank order in CYCLONE (R) # of characters include (C) – short sentences are preferred Normalize each factor and compute final score as a weighed average of the three factors above (W>R>C)

Selection (cont) For miscellaneous group, they select the most dissimilar sentence to representative sentences selected from the regular groups

Presentation

Top 50 paragraphs for the term “XML” Only one sentence was selected from each group Each viewpoint label or sentence is hyper- linked to the associated group or the source paragraph Presentation (cont)

Evaluation Summarization evaluation can be classified into intrinsic and extrinsic approaches Intrinsic: the quality of a text, informativeness Extrinsic: if a summary improves the efficiency of a specific task

Evaluation (cont) 15 Japanese terms are test inputs In order to calculate the coverage, for each of the 15 terms, two students annotate each simple sentence in the top 50 paragraphs in the CYCLONE results with one or more viewpoints They define 28 viewpoints including the 12 viewpoints Compression ratio and coverage were calculate by the top 50 paragraphs

Results #Reps: the number of representative sentences selected from each viewpoint group #Chars: the number of characters in a summary They select five sentences from the miscellaneous group VBS: viewpoint-based summarization method Lead: systematically extracted the top N characters from the CYCLONE results

Conclusion To compile encyclopedic term descriptions from the Web, they introduced a summarization method They identify the simple sentences, classify those sentences into viewpoint groups, select the representative sentences from each group and show them up VBS got good compression ratio and the coverage score is better than baseline Future work includes generating a coherent text and performing extrinsic evaluation method