Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.

Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries Hassan, Jones, Klinkner, Beyond DCG: User Behavior as a Predictor of a Successful Search

A Taxonomy of Web Searches [Andrei Broder] classifies web queries according to their intent: – Navigational - reach a particular site Example: cnn, Oracle – Informational - acquire some information Example: the history of haifa, information retrieval – Transactional - perform some web-mediated activity. Further interaction is expected. E.g. shopping, downloading files, accessing databases Example: new balance shoes, Israel flights

Query Log Search Engine Query Log records users’ searches A typical record contains – Anonymous User id u – Search query q – Returned documents V – Clicked documents C – Timestamp t

Query Log Example 1234, apple, 12:04 1234, apple ipod, 12:05 1234 ynet, 12:13 145 google, 12:20 145 eBay, 12:56 32 ynet news, 12:59 145 Solaris systen, 13:01 145 Solaris system, 13:05 …

Session A sequence of searches of one particular user u within a specific time limit S =, …, > t1 ordered sequence) ti+1 – ti t0 is a timeout threshold) Note1 may contain non related queries Note2 identifying sessions is easy

Session Example 1234, apple, 12:04 1234, apple ipod, 12:05 1234 ynet, 12:13 1234 apple store, 12:20 1234 cnn news, 12:56 1234 cnn webcast, 12:59 1234 apple apps, 13:01 Session 1 Session 2 Timeout threshold = 30 minutes

Query Chain A sequence of queries with a similar information need of a particular user – Also known as mission or logical session Example: haifa maps haifa travel attractions in haifa Note1 contains related queries only Note2 identifying chains is difficult

Query Chain Example 1234, apple, 12:04 1234, apple ipod, 12:05 1234 ynet, 12:13 1234 apple store, 12:20 1234 cnn news, 12:56 1234 cnn webcast, 12:59 1234 apple apps, 13:01 chain1 chain2

Click Graph Bipartite graph Nodes in left side are unique queries Nodes in right side are unique URLs An edge between q,u if there exists in the log a click on u for query q Edges may be weighted according to number of clicks This graph is used by numerous Algorithm for various purposes E.g., query and URL clustering, query recommendations …

Query Graphs Each unique query is a node in the graph Next slides – Connection types between queries (edges) Proposed by [Ricardo Baeza-Yates]

Query Graphs – Word Graph An edge between nodes exists, if queries share common terms Possible node weight – Number of occurrences in the log Possible edge weight - Jaccard distance paris hotels cheap paris hotels paris attractions london attractions

Query Graphs – Session Graph Node’s q weight is the number of sessions that contain the query q (usually equals number of query occurrences) A directed edge from q1 to q2 if q1 occurred before q2 in the same session Edge’s weight is number of such occurrences paris hotels paris attractions cheap paris hotels london attractions

Query Graphs – URL Cover Graph paris hotels paris attractions cheap paris hotels london attractions An edge exists between q1 and q2, if they share clicked URLs Node weight = #occurrences Edge’s weight is the number of common clicks

Query Graph – URL Link Graph paris hotels paris attractions cheap paris hotels london attractions An edge exists between q1 and q2, if there is at least one link between a url click of q1 and a url click of q2 Node weight =#occurrences Edge’s weight is the number of such common links

Query Graph –URL Terms Graph paris hotels paris attractions cheap paris hotels london attractions Represent a clicked URL by a set of terms (whole page, snippet, anchors, title, a combination …) Weight terms by their frequencies Node weight =#occurrences There’s an edge between q1 and q2 if there are at least m common terms in at least one clicked url of q1 and one clicked url of q2 Edge weight is sum of frequencies of common terms

User Behavior as a Predictor of a Successful Search Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not – Success – user is satisfied with the results – Failure – user is unsatisfied Method: – Analyze the query log and learn success/failure patterns – Use learned models for prediction Proposed by [Hassan, Jones and Klinkner]

Data A rich query log of queries and user actions: – Query (Q) – Search Click (SR) – Sponsored Search Click (AD) – Related Search Click (RL) Query recommendations – Spelling Suggestion Click (SP) – Shortcut Click (SC) E.g. image, video, news … – Any Other Click (OTH) E.g. browser tab

Data Labeling Random sample of user sessions Human editors labeled data: – Detected logical sessions – Success/Failure definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful

Markov Models Partition training data into two splits – successful goals – unsuccessful goals For each group construct a Markov Model derived from seen action sequences – A Model describes the user behavior in case of a successful/unsuccessful search goal – Action type is a state – Weight a transition from one state to another according to its probability as observed in the data (MLE)

Transition Weighting - MLE

Illustration START Q SR END AD RL 1 0.3 0.1 0.6 0.1 0.4 0.5 1 1

Prediction (1) Given a user’s action sequence, need to predict whether it is successful or not We’ve learned two models Ms and Mf of successful and unsuccessful patterns Compute the probability that a given sequence S={S1,…,Sn} was generated from Ms, same for Mf Predict success/non success by computing log likelihood – Formulas in next slide

Prediction (2) Formulas taken from the paper

Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.

Similar presentations

Presentation on theme: "Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.

Similar presentations

Presentation on theme: "Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries."— Presentation transcript:

Similar presentations

About project

Feedback