From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Center for E-Business Technology Seoul National University Seoul, Korea Socially Filtered Web Search: An approach using social bookmarking tags to personalize.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Information Retrieval in Practice
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Overview of Search Engines
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
WebPage Summarization Using Clickthrough Data JianTao Sun & Yuchang Lu, TsingHua University, China Dou Shen & Qiang Yang, HK University of Science & Technology.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
23. Juli By Benjamin Riedel Collaborative Web.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Adversarial Information Retrieval The Manipulation of Web Content.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Search Engines and Information Retrieval Chapter 1.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Tag Data and Personalized Information Retrieval 1.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches John HannonJohn Hannon, Mike Bennett, Barry SmythBarry Smyth.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
BioSumm A novel summarizer oriented to biological information Elena Baralis, Alessandro Fiori, Lorenzo Montrucchio Politecnico di Torino Introduction text.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Center for E-Business Technology Seoul National University Seoul, Korea Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems.
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
LING 573 Deliverable 3 Jonggun Park Haotian He Maria Antoniak Ron Lockwood.
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Artificial Intelligence Techniques Internet Applications 4.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Information Retrieval in Practice
Search Engine Architecture
Presentation transcript:

From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster, School of Computer Science and Informatics University College Dublin 2007 Intelligent User Interfaces Presented by Sharon HSIAO

Agenda Introduction Novelty way to generate a social summary Evaluation & Methodology Experiments Discussion Conclusion

Introduction Traditional approach of summarization technique may perform well in general; however, it may not meet the needs and preferences of individual users or a community of users, to extract the core content of the document effectively

Summarization 2 broad approaches to summarization: –Extraction Open Text Summarizer (OTS) MEAD Summarizer Word occurrence and positional information to extract high scoring sentences –Abstraction Rely heavily on syntactic Representation is conceptual

Web page Summarization Html markup In-linking text Search engine click-through Sentence-selection algorithm: web content+query click-through the weight of query words is increased according to its frequency within the query collection Social summarization interaction or usage data can be used to good effect to generate high quality summaries of Web pages

idea of Social Summarization 1. A page p can be associated with a set of queries, Q(p) =q1,..., qn 2. For a given query, qi, the search engine (SE) will produce a query-sensitive snippet, S SE (p, qi), which contains a number of sentence fragments 3. The social summary for p, SS SE (p), can be constructed from the combination of fragments associated with Q(p) according to the importance of the fragment, give rank order

Generating a social summary 1.extract the snippet texts, S(bi, p) to produce a set of sentence fragments 2.normalise sentence fragments to cope with fragment overlap and subsumption 3.score each sentence fragment according to its frequency of occurrence across the snippets 4.rank-order the normalised fragments to produce the final summary

Setup & Methodology Data from Del.icio.us 3781 bookmarked pages Tags up to a maximum 50 per page 1386 pages contained description text within HMTL meta-content description tag Compared with OTS and MEAD Lucene snippet generator (Apache Foundation) ROUGE(Recall-Oriented Understudy for Gisting Evaluation): to compare generated to gold- standard; counting overlapping n-gram, word sequences, word pairs

Experiment 1 Comparison of Summary Quality Avg length of SS summaries was 24% of the original

Experiment 2 Summary Length vs. Quality consider the quality of summaries of different lengths, by eliminating low scoring fragments from the final social summary

Experiment 3 Search Activity vs Quality consider the relationship between the number of available cues (bookmark tags, in this case) and summary quality query sets of size 1-10, 11-20, 21-30, , and queries selected randomly, producing nearly 25,000 different summaries in total

SS produces summaries with recall scores that are 31% better than the OTS summaries and approximately 28% better than the MEAD summaries

Discussion Query-Focused Social Summaries –generating a more focused social summary that is informed perhaps by the context provided by some target user query, SS(p, qT ) –top ranking results may be associated with longer (more detailed) social summaries than lower ranking results

Community-Focused Social Summaries –social summarization technique can be used to generate query focused snippets that better reflect the niche needs of a particular community of searchers –identify those queries that have led to the past selection of p by community members and that are similar to qT –Eg. “Jaguar parts” “Genuine Jaguar, Land Rover and Range Rover OEM and brand name aftermarket parts” “The one-stop-shop for genuine restoration Jaguar parts for all classic models including S Type, X Type, X300 - XJR,...”

Preliminary results extracted the top 100 bookmarked pages for tag “travel” Then extracted the top bookmark tags used to label each of these pages; generate a new set of tags (eg. European travel, travel tips…) 1153 bookmarked pages, 5291 unique sets of terms, 6290 unique users Training & test set 5 random split training&test

Conclusion social summarization technique produces higher-quality summaries query-focused social summaries provide searchers with improved result-snippet summaries community-focused summaries — summaries that better reflect the needs of communities of like minded users