Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang.

Slides:



Advertisements
Similar presentations
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Advertisements

2. Searching the OED. Type a word or phrase in the ‘Quick search’ box…
Mining User-logs for Query and Term Clustering - An exploration on Encarta encyclopedia Jian-Yun Nie, RALI, DIRO, Univ. Montreal Ji-Rong Wen, Hong-Jiang.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
SciVal Experts & SciVal Funding Information Sessions.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Page 1 June 2, 2015 Optimizing for Search Making it easier for users to find your content.
OntoBlog: Linking Ontology and Blogs Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of Informatics, Japan 2 Asian.
Evaluating Search Engine
Information Retrieval in Practice
Search Engines and Information Retrieval
IR Models: Structural Models
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Information Retrieval
Overview of Search Engines
To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.
Building a XanEdu CoursePack Copyright 2004 ProQuest Information and Learning Company. All rights reserved.
Search Engines and Information Retrieval Chapter 1.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Web Document Clustering By Sang-Cheol Seok. 1.Introduction: Web document clustering? Why ? Two results for the same query ‘amazon’ Google : currently.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
© RightNow Technologies, Inc. Ask The Experts: Getting the most out of Smart Assistant David Fulton, Product Manager, Web Experience Center Of Excellence,
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
Search Engines By: Faruq Hasan.
Vector Space Models.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
GOOGLE SCHOLAR Compiled by Helene van der Sandt. WHAT IS GOOGLE SCHOLAR?
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Human Computer Interaction Lecture 21,22 User Support
WEB SPAM.
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Improving Data Discovery Through Semantic Search
Structured Browsing for Unstructured Text
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
EBSCOhost Basic Searching for Academic Libraries
LEARNING AREA 4 : MULTIMEDIA
Evaluation of IR Performance
Introduction to Information Retrieval
5. Setting up Alerts.
Information Retrieval and Web Design
Presentation transcript:

Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang

Smarter searches Search engines were moving beyond simple keyword matching. The big idea was to “understand” the users queries, then suggest similar queries. The significance of these “similar queries”: Other users have asked them, and received correct answers.

Two assumptions 1.Users click on the same documents, having used different queries, then the queries are similar. 2.If a set of documents is often selected for a set of queries, then the terms in the documents are related to the terms in the queries. Key point – similar queries would have been grouped into multiple clusters using keywords alone.

The aims The editors were seeking to improve the encyclopaedia so that the users could locate information in a more precise way. In particular: 1.If Encarta does not provide sufficient information for FAQ, then improve the entries. 2.If an FAQ is emerging as a “hot topic”, then check the results set, and provide direct links. This paper is about helping out with issue 2.

Raw material User logs for searches against the online Encarta encyclopaedia. Session means query session rather than user session session := queryText [clickedDocument]* The Encarta titles were carefully crafted, so the assumption is that if user clicks were based on relevance.

Clustering principles 1. Using query contents. If two queries contain the same of similar terms, they denote the same or similar information needs. More useful for longer queries. 2. Using document clicks. If two queries lead to the selection of the same documents, then they are similar. Both principles were used.

Clustering algorithm requirements 1.No manual configuration of the clusters 2.Filter out queries with low frequencies 3.Fast 4.Incremental Selected DBSCAN & incremental DBSCAN, But provided their own similarity function.

Similarity Based on Query Contents

Plus refinements: If phrases can be identified: they can be treated as single term in the calculations. Easy in this case as Encarta supplied a dictionary of phrases. There were plans to include syntactic analysis to identify noun phrases.

Similarity Based on Query Contents Similarity based on edit distance: The number of insertions, deletions, and/or replacements needed to unify two queries. Found to be useful for long and complex queries in preliminary tests. Implemented? Also mentioned the possibility of using Wordnet synonyms.

Similarity Based on User Feedback Single documents: Similarity doc = RD(p,q)/Max( rd(p), rd(q))

Similarity Based on User Feedback Encarta documents are hierarchal: A concept taxonomy. The lower the common branch, The higher the similarity. S(d i, d j ) = (L(F(d i, d j ))-1)/L_Total)

Outcomes The authors stated the need for more empirical results data, but were happy with their progress. But – no actual results. Their approach was certainly successful in detecting similarities missed by other approaches.