Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Struggling or Exploring? Disambiguating Long Search Sessions
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
Addressing Diverse User Preferences in SQL-Query-Result Navigation SIGMOD ‘07 Zhiyuan Chen Tao Li University of Maryland, Baltimore County Florida International.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Evaluating Search Engine
Information Retrieval in Practice
By Andrei Broder, IBM Research 1 A Taxonomy of Web Search Presented By o Onur Özbek o Mirun Akyüz.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari.
Web 2.0: Concepts and Applications 4 Organizing Information.
Server-side Scripting Powering the webs favourite services.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
The identification of interesting web sites Presented by Xiaoshu Cai.
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Demo. Overview Overall the project has two main goals: 1) Develop a method to use sensor data to determine behavior probability. 2) Use the behavior probability.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Web software. Two types of web software Browser software – used to search for and view websites. Web development software – used to create webpages/websites.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
A Taxonomy of Web Searches Andrei Broder, SIGIR Forum, 2002 Ahmet Yenicag Ceyhun Karbeyaz.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Post-Ranking query suggestion by diversifying search Chao Wang.
Search engine note. Search Signals “Heuristics” which allow for the sorting of search results – Word based: frequency, position, … – HTML based: emphasis,
Measuring the value of search trails in web logs Presentation by Maksym Taran & Scott Breyfogle Research by Ryen White & Jeff Huang.
Relevance Feedback Hongning Wang
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Data mining in web applications
Information Retrieval in Practice
What is Google Analytics?
Map Reduce.
Multimedia Information Retrieval
Detecting Online Commercial Intention (OCI)
Presentation transcript:

Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries Hassan, Jones, Klinkner, Beyond DCG: User Behavior as a Predictor of a Successful Search

A Taxonomy of Web Searches [Andrei Broder] classifies web queries according to their intent: – Navigational - reach a particular site Example: cnn, Oracle – Informational - acquire some information Example: the history of haifa, information retrieval – Transactional - perform some web-mediated activity. Further interaction is expected. E.g. shopping, downloading files, accessing databases Example: new balance shoes, Israel flights

Query Log Search Engine Query Log records users’ searches A typical record contains – Anonymous User id u – Search query q – Returned documents V – Clicked documents C – Timestamp t

Query Log Example 1234, apple, 12: , apple ipod, 12: ynet, 12: google, 12: eBay, 12:56 32 ynet news, 12: Solaris systen, 13: Solaris system, 13:05 …

Session A sequence of searches of one particular user u within a specific time limit S =, …, > t1 ordered sequence) ti+1 – ti t0 is a timeout threshold) Note1 may contain non related queries Note2 identifying sessions is easy

Session Example 1234, apple, 12: , apple ipod, 12: ynet, 12: apple store, 12: cnn news, 12: cnn webcast, 12: apple apps, 13:01 Session 1 Session 2 Timeout threshold = 30 minutes

Query Chain A sequence of queries with a similar information need of a particular user – Also known as mission or logical session Example: haifa maps haifa travel attractions in haifa Note1 contains related queries only Note2 identifying chains is difficult

Query Chain Example 1234, apple, 12: , apple ipod, 12: ynet, 12: apple store, 12: cnn news, 12: cnn webcast, 12: apple apps, 13:01 chain1 chain2

Click Graph Bipartite graph Nodes in left side are unique queries Nodes in right side are unique URLs An edge between q,u if there exists in the log a click on u for query q Edges may be weighted according to number of clicks This graph is used by numerous Algorithm for various purposes E.g., query and URL clustering, query recommendations …

Query Graphs Each unique query is a node in the graph Next slides – Connection types between queries (edges) Proposed by [Ricardo Baeza-Yates]

Query Graphs – Word Graph An edge between nodes exists, if queries share common terms Possible node weight – Number of occurrences in the log Possible edge weight - Jaccard distance paris hotels cheap paris hotels paris attractions london attractions

Query Graphs – Session Graph Node’s q weight is the number of sessions that contain the query q (usually equals number of query occurrences) A directed edge from q1 to q2 if q1 occurred before q2 in the same session Edge’s weight is number of such occurrences paris hotels paris attractions cheap paris hotels london attractions

Query Graphs – URL Cover Graph paris hotels paris attractions cheap paris hotels london attractions An edge exists between q1 and q2, if they share clicked URLs Node weight = #occurrences Edge’s weight is the number of common clicks

Query Graph – URL Link Graph paris hotels paris attractions cheap paris hotels london attractions An edge exists between q1 and q2, if there is at least one link between a url click of q1 and a url click of q2 Node weight =#occurrences Edge’s weight is the number of such common links

Query Graph –URL Terms Graph paris hotels paris attractions cheap paris hotels london attractions Represent a clicked URL by a set of terms (whole page, snippet, anchors, title, a combination …) Weight terms by their frequencies Node weight =#occurrences There’s an edge between q1 and q2 if there are at least m common terms in at least one clicked url of q1 and one clicked url of q2 Edge weight is sum of frequencies of common terms

User Behavior as a Predictor of a Successful Search Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not – Success – user is satisfied with the results – Failure – user is unsatisfied Method: – Analyze the query log and learn success/failure patterns – Use learned models for prediction Proposed by [Hassan, Jones and Klinkner]

Data A rich query log of queries and user actions: – Query (Q) – Search Click (SR) – Sponsored Search Click (AD) – Related Search Click (RL) Query recommendations – Spelling Suggestion Click (SP) – Shortcut Click (SC) E.g. image, video, news … – Any Other Click (OTH) E.g. browser tab

Data Labeling Random sample of user sessions Human editors labeled data: – Detected logical sessions – Success/Failure definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful

Markov Models Partition training data into two splits – successful goals – unsuccessful goals For each group construct a Markov Model derived from seen action sequences – A Model describes the user behavior in case of a successful/unsuccessful search goal – Action type is a state – Weight a transition from one state to another according to its probability as observed in the data (MLE)

Transition Weighting - MLE

Illustration START Q SR END AD RL

Prediction (1) Given a user’s action sequence, need to predict whether it is successful or not We’ve learned two models Ms and Mf of successful and unsuccessful patterns Compute the probability that a given sequence S={S1,…,Sn} was generated from Ms, same for Mf Predict success/non success by computing log likelihood – Formulas in next slide

Prediction (2) Formulas taken from the paper