Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Chapter 5: Introduction to Information Retrieval
1 Very Large-Scale Incremental Clustering Berk Berker Mumin Cebe Ismet Zeki Yalniz 27 March 2007.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
A machine learning approach to improve precision for navigational queries in a Web information retrieval system Reiner Kraft
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard Association for Computational Linguistics,
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Hierarchical Document Clustering using Frequent Itemsets Benjamin C. M. Fung, Ke Wang, Martin Ester SDM 2003 Presentation Serhiy Polyakov DSCI 5240 Fall.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
1 FollowMyLink Individual APT Presentation Third Talk February 2006.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
COMP Data Mining: Concepts, Algorithms, and Applications 1 K-means Arbitrarily choose k objects as the initial cluster centers Until no change,
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Vector Space Models.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Progress presentation
Web Search Personalization with Ontological User Profile Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
1 CS 430: Information Discovery Lecture 5 Ranking.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
1 Personalized IR Reloaded Xuehua Shen
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Data Mining and Text Mining. The Standard Data Mining process.
Semi-Supervised Clustering
Increased Efficiency and Effectiveness
Author: Kazunari Sugiyama, etc. (WWW2004)
Information Organization: Clustering
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Web Information retrieval (Web IR)
Anatomy of a Search Search The Index:
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Retrieval Utilities Relevance feedback Clustering
Zhixiang Chen & Xiannong Meng U.Texas-PanAm & Bucknell Univ.
Discussion Class 9 Google.
Presentation transcript:

Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng

Introduction Search engine’s objectives –Rank most relevant search results at top Effectiveness PageRank / HITS –Group and present different categories of search results Global view Clustering

Clustering Personalized Search Results Study the clustering problem in the UCAIR framework Personalized search ranks or reranks the search results based on user implicit feedback Bring interesting problems –Efficient and effective clustering/presentation –Dynamically update the clustering results based on personalization

Goal Effective –Cluster user search results into meaningful groups –Present in a clear format –Provide users with main themes of search results Efficient –Implement efficient clustering algorithms Dynamic –Dynamically maintain the clustering results based on personalized ranking and reranking

Progress Implemented two clustering algorithms –K-Medoids –Hierarchical clustering Presentation –Replace Google ads with clustering results –Present ranked results together with clustering results –Two presentation strategies Most centrally located document in each cluster Most frequent terms in each cluster

Partial Results K-Medoids –Select the most centrally located documents as cluster center –Present the centroid documents as each cluster’s representative –Efficiency not so good Other processing time: =2152 ms Cluster search results time: 2844 ms

Partial Results (II) Hierarchical clustering –Merge similar documents in a pair-wise manner –Use weighted average term vectors to represent cluster center –Present centroid term vectors as a virtual documents (output Top-K terms) –Efficiency better than K-Medoids Other processing time: = 1141 ms Cluster search results time: 661 ms

Efficiency Analysis K-Medoids –O(k(n-k) 2 ) for each iteration where n is # of documents, k is # of clusters –Need multiple iterations for convergence Hierarchical clustering –O(n 2 ) for each iteration –Need n-k iterations

Lessons Learned Clustering takes longer time as more search results accumulate (when we click “Next”) Top-K frequent terms in each cluster sometimes do not make sense –Combine additional information besides term frequency Re-cluster each time when reranking search results –Incremental update of clustering results is desired!

Remaining Implementation –KMeans –MMR –Frequent word sets Effective presentation study –Based on user feedback –Literature survey Dynamic maintenance of clustering based on search result ranking and reranking –Drill down in a particular cluster –Update overall clustering organization

Feedback Which way to present clustering results is more meaningful? –Based on central documents –Based on term vectors –More options? Any other clustering algorithms to achieve effectiveness and efficiency? Any other presentation strategy besides “rank list + cluster center” ?