Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Chapter 5: Introduction to Information Retrieval
Introduction to Information Retrieval
Optimizing search engines using clickthrough data
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Information Retrieval in Practice
Information Retrieval Review
Chapter 7: Text mining UIC - CS 594 Bing Liu 1 1.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Vector Space Model CS 652 Information Extraction and Integration.
Chapter 19: Information Retrieval
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Personalized Search Xiao Liu
Chapter 6: Information Retrieval and Web Search
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Algorithmic Detection of Semantic Similarity WWW 2005.
Web- and Multimedia-based Information Systems Lecture 2.
Vector Space Models.
Post-Ranking query suggestion by diversifying search Chao Wang.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
General Architecture of Retrieval Systems 1Adrienn Skrop.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
User Modeling for Personal Assistant
Evaluation Anisio Lacerda.
Search Engine Architecture
Inferring People’s Site Preference in Web Search
A Deep Learning Technical Paper Recommender System
Information Retrieval
Chapter 5: Information Retrieval and Web Search
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU Athens, Greece) Collaborative Ranking Function Training for Web Search Personalization National Technical University of Athens School of Electrical and Computer Engineering Divison of Computer Science Institute for the Management of Information Systems Athena Research Center

2 Intro How to personalize search results?

Intro How to personalize search results? Step 1. Implicit (from user log clicks) or explicit feedback can give you relevance judgments, i.e. irrelevant, partially relevant, relevant 3 irrelevant unjudged relevant unjudged irrelevant Partially relevant

Intro How to personalize search results? Step 1. Implicit (from user log clicks) or explicit feedback can give you relevance judgments, i.e. irrelevant, partially relevant, relevant Step 2. Extract features from query-result pairs Text similarity between query-result, 2. rank of result in Google, 3. domain of the result url irrelevant unjudged relevant unjudged irrelevant Partially relevant

Intro How to personalize search results? Step 1. Implicit (from user log clicks) or explicit feedback can give you relevance judgments, i.e. irrelevant, partially relevant, relevant Step 2. Extract features from query-result pairs. Step 3. Feed a ranking function (i.e. RSVM) with judgments and features. TRAINED RANKING FUNCTION 5 1. Text similarity between query-result, 2. rank of result in Google, 3. domain of the result url irrelevant unjudged relevant unjudged irrelevant Partially relevant

6 Intro How to personalize search results? Step 1. Implicit (from user log clicks) or explicit feedback can give you relevance judgments, i.e. irrelevant, partially relevant, relevant Step 2. Extract features from query-result pairs. Step 3. Feed a ranking function (i.e. RSVM) with judgments and features. Step 4. Re-rank the results using trained function irrelevant unjudged relevant unjudged Re-rank the results irrelevant Partially relevant relevant Partially relevant relevant irrelevant

7 Problem I Usually users search in more than one different areas of interest Example scenario 1: A phd student searches for papers on “web search ranking” They would prefer clicking on results with “acm” in title or url They would also prefer pdf results The same student searches for info about “samsung omnia cellphone” They would prefer results with “review”, “specs” or “hands on” in title or abstract They would also prefer results from blogs or forums or video results

8 Problem I Usually users search in more than one different areas of interest Example scenario 1: A phd student searches for papers on “web search ranking” They would prefer clicking on results with “acm” in title or url They would also prefer pdf results The same student searches for info about “samsung omnia cellphone” They would prefer results with “review”, “specs” or “hands on” in title or abstract They would also prefer results from blogs or forums or video results Training a single ranking function model for this user: Could favor video results while searching for papers Could favor pdf results (cellphone manual) while searching for reviews about a cellphone

9 Problem II Even users with different search behaviors may share a common search area Example scenario 2: User A is a phd student in IR and searches mostly for papers on their search area User B is a linguist and searches mostly for papers on their search area However, they could both be interested in new cellphones

10 Problem II Even users with different search behaviors may share a common search area Example scenario 2: User A is a phd student in IR and searches mostly for papers on their search area User B is a linguist and searches mostly for papers on their search area However, they could both be interested in new cellphones Training a common ranking function model for both users: Would probably give a better model for searching on cellphones Would probably give a worse model for the rest of the searches

11 Problem II Even users with different search behaviors may share a common search area Example scenario 2: User A is a phd student in IR and searches mostly for papers on their search area User B is a linguist and searches mostly for papers on their search area However, they could both be interested in new cellphones Training a single ranking function model for each user: Would not utilize each user’s behavior on common search areas Example: User A is familiar with “ a very informative site about cellphones, while user B is not Training a common ranking function on this particular search area, would favor “ in both users’ searches As a result, user B would become aware of this site and use it in future searches

12 Solution Train multiple ranking functions Each ranking function corresponds: Not to a single user Not to a group of users But to a topic area: A group of search results With similar content Collected from all users When re-ranking search results: Check which topic areas match with each new query Re-rank the query’s results according to the ranking functions trained for those topic areas

13 Our method (phase 1) Clustering on clicked results of all queries of all users Clicked results are more informative than all results or the query Partitional clustering (repeated bisections) Result representation: Term vector of size N (= the number of distinct terms in all results) Every feature is represented by a weight w (relating a result with a term) w depends on term’s tf and idf Title and abstract are considered as result’s text Use cosine similarity on term vectors as metric to compare two results Clustering criterion function maximizes Output Clusters containing (clicked) results with similar content (topic clusters)

14 Our method (phase 2) Cluster Indexing To be able to calculate the similarity of each new query with each cluster Extraction of title and abstract text of all results belonging to each cluster Use of this text as textual representation of the cluster Indexing of clusters-documents Output Inverted index on clusters’ textual representations

15 Our method (phase 3) Multiple ranking function training Use of Ranking SVM Each ranking function model F i is trained with clickstream data only from the corresponding cluster i Features used: Textual similarity between query and (title, abstract,URL) of the result Domain of the result (.com,.edu, e.t.c.) Rank in Google Special words (“blog”, “forum”, “wiki”, “portal”, e.t.c.) found in result’s title, abstract or url Features denoting textual similarity of each word with title/abstract/url URL suffix (.html,.pdf,.ppt) 100 most frequent words in all result documents of all searches Features denoting textual similarity of each word with title/abstract/url

16 Our method (phase 4) For each new query q: We calculate its textual similarity w qi with each cluster i using the index from phase 2 We produce N different rankings R qi with r qij being the rank of result j, for query q after re-ranking results using the ranking function trained on cluster i Final rank: In other words: w qi represents how similar is the content of cluster i with query q r qij gives the result rank when using the ranking function of cluster I We combine all produced rankings according to how similar they are to the query

17 Our method

18 Evaluation Logging mechanism on google search engine: Queries Result lists Clicked results User IPs Date and time Search topics: Gadgets Cinema Auto/Moto Life & Health Science Users: 10 phd students and researchers from our lab 1-3 search topics per user 2 months search period 671 queries First 501 queries used as training set Last 170 queries used as test set

19 Evaluation Comparison of our method (T3) with: Training a common ranking function for all users (T1) Training one ranking function per user (T2) Results: Average change in rank between our method and the base methods: Percentages of clicked results belonging to each cluster for each user

20 Conclusion-Future work First cut approach to the problem “Collaborative” training of ranking functions for personalization based on topic areas Encouraging results Preliminary experiments Extensions More extended experiments with (much) larger datasets Experiments to verify the homogeneity of topic clusters Experiments to verify the efficiency in very large datasets More performance measures (precision/recall) Topic clusters inference Clustering on feature vectors (and not on text) of results Use of pre-defined topic hierarchies and classification techniques for detecting topic areas (ODP) Dynamic switching between topic clusters by the users themselves