Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Working on a Mini-Project Anders P. Ravn/Arne Skou Computer Science Aalborg University February 2011.
Evaluating Search Engine
ADVISE: Advanced Digital Video Information Segmentation Engine
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Generalized Minimum Bias Models
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
1 Can People Collaborate to Improve the relevance of Search Results? Florian Eiteljörge June 11, 2013Florian Eiteljörge.
Review of Literature Announcement: Today’s class location has been rescheduled to TEC 112 Next Week: Bring four questions (15 copies) to share with your.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Graph Data Management Lab, School of Computer Science Add title here: Large graph processing
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
YZUCSE SYSLAB A Study of Web Search Engine Bias and its Assessment Ing-Xiang Chen and Cheng-Zen Yang Dept. of Computer Science and Engineering Yuan Ze.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Easiest-to-Reach Neighbor Search Fatimah Aldubaisi.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Improving Dependability in Service Oriented Architectures using Ontologies and Fault Injection Binka Gwynne Jie Xu School of Computing University of Leeds.
DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM Submitted in partial fulfillment of requirement for the V Sem MCA Mini Project Under Visvesvaraya Technological.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
清华大学计算机系 Answer Generating Methods for Community Question and Answering Portals {Tao Haoxiong, Hao Yu, Zhu University.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
Genie Pal A Versatile Intelligent Assistant To Help Both Work And Personal life.
Technical Writing (Applies to research papers and theses)
Recommending Forum Posts to Designated Experts
Algorithms and Problem Solving
How to Write a Review Article
TITLE What should be in Objective, Method and Significant
Updating SF-Tree Speaker: Ho Wai Shing.
Collection Fusion in Carrot2
Research Methods Dr. X.
Introduction to IR Research
Writing the research protocol
Course Summary (Lecture for CS410 Intro Text Info Systems)
Hansheng Xue School of Computer Science and Technology
Query in Streaming Environment
Next-Generation Search Engines -Perspective and challenges
ITE 130 Web Searching.
Rui Wu, Jose Painumkal, Sergiu M. Dascalu, Frederick C. Harris, Jr
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Effective Social Network Quarantine with Minimal Isolation Costs
Chapter 4 Functions Objectives
Objective of This Course
How to Read Research Papers?
READING A PAPER.
Disambiguation Algorithm for People Search on the Web
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Algorithms and Problem Solving
ภาควิชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์
Overview of Query Evaluation
Navigation-Aided Retrieval
6 – Miracle And “Hello World”
Actively Learning Ontology Matching via User Interaction
Online Analytical Processing Stream Data: Is It Feasible?
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
A Coupled User Clustering Algorithm for Web-based Learning Systems
Presentation transcript:

Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign Traditional approach Abstract Information overload is a popular problem today. This problem could be solved partially with Search Engine: a tool helps find needed information from the whole web. However, even though some Search Engines work very well, users still cannot avoid information overload problem: there are so many returned results. Post processing search result is a step to further reduce the information overload problem by organizing search results such that minimizing the effort for examining them. This project proposes a novel technique for organizing search results: Inductive Clustering. IC in detail Observation: The more specific query we use, the less results we get. Key idea: From the returned results, generate a summary. Results agree with that summary will be the first cluster. Generate a summary for the remain results; results agree with that summary will be the second cluster. Do the same process until all results are clustered. A large cluster could be clustered more in the same way. Three essential ingredients Need to define a similarity function Need to define a threshold Need to choose the number of clusters in advance and or Those ingredients heavily affect clustering quality. Unfortunately, there is no guidance to tune those things, especially with threshold and number of clusters !!! Don’t need a threshold or a given number of clusters Intuitively, results tend to agree with cluster’s summary It’s easy to continue cluster a large cluster into smaller clusters Introduction Example of an ambiguous query Cluster titles Our approach Inductive Clustering (IC) Experiment Considering first 100 results returned by Google for 30 queries. Observed clusters shows that the algorithm work extremely well. Average Precision with cluster title: 90.5% Average Precision without cluster title: 95.6% Average Precision of cluster’s title: 91.4% Average execution time: 0.27 seconds User’s query WWW blue Summaries for clusters are generated in advance Conclusion Inductive Clustering is a novel technique to post-process returned search results. The approach does not require manually tuned parameters as previous approaches. The experiments show that IC work extremely well: cluster’s titles are comprehensive, results in each cluster agree with the titles, and execution time is negligible. Results organized with IC are much more easy to captured by users. We envision that IC should be implemented as an online service for broad usage. This project was done under advising of Prof. ChengXiang Zhai *hieule2@uiuc.edu - Date: 05/01/2005* Example of an unambiguous query Clusters with high confidence Sub-queries Summarizing Executing query