Download presentation
Presentation is loading. Please wait.
Published byPhyllis Katrina Short Modified over 9 years ago
1
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR 2009 2010.04.27 Summarized and presented by Sang-il Song, IDS Lab., Seoul National University
2
Copyright 2010 by CEBT Query Classification Query Classification (QC) Understanding user’s search intent Classifying user queries into predefined target categories. Difference from traditional text classification – Queries are usually very short – Many queries are ambiguous, so that it belongs to multiple categories Approaches – Augmenting the queries with extra data (search results) – Leveraging unlabeled data to help improve the accuracy of supervised learning – Expanding training data by automatically labeling some queries in some click-through data via a self-training These approaches doesn’t consider user behavior history 2
3
Copyright 2010 by CEBT Context Query Classification Motivation Example Query “Jaguar” w.o. context – Ambiguous that user is interested in “car” or “animal” Query “jaguar” before “BMW” – Clear that User is interested in “car” Context Information Adjacent queries Clicked URLs This paper is modeling context information with CRF 3
4
Copyright 2010 by CEBT User Session User search session Series of observation Each consists of a query and a set of URL, clicked by user for 4
5
Copyright 2010 by CEBT Taxonomy Taxonomy Tree of categories Each node corresponds to a predefined category 5
6
Copyright 2010 by CEBT Conditional Random Field Undirected graphical model input sequence p ij depends on feature function Motivation for using CRF Suitable for capturing context information Doesn’t need any prior knowledge Flexible to richer feature 6 s2 s1 s3 s4 p 11 p 22 p 44 p 33 p 23 p 21 p 24 p12 p13 p14 p32 p42 p41 p43 p31 p34
7
Copyright 2010 by CEBT Context-Aware QC with CRF world cup worldcup.fifa.com fifa fifa10.ea.com fifa news fifaworldcup.ea.com 0.8 0.2 0.3 0.7 0.05 0.95 0.7 0.3 0.4 0.6 0.7 0.3 0.4 0.6 0.8 0.2 0.24 0.56 0.01 0.19 0.168 0.072 0.224 0.336 0.007 0.003 0.076 0.114 soccer game Category Label 7
8
Copyright 2010 by CEBT Conditional Probability Conditional Probability Category label sequence Observation sequence Conditional Probability – Z(o) : normalization factor Potential function – fk : feature function – lk : weight of fk 8
9
Copyright 2010 by CEBT Training and Classification Training Given Training Data Objective – find a set of parameters – Maximize the conditional log-likelihood: Inferring the category label ct for the test query as 9
10
Copyright 2010 by CEBT Features FeatureWhat does it use? local feature Query terms Pseudo feedbackExternal Web directory Implicit feedback External Web directory + click information contextual feature Direct Association between adjacent labels Previous labels Taxonomy-based association between adjacent labels Taxonomy structure Feature 10
11
Copyright 2010 by CEBT Local Feature Query Terms Elementary feature too sparse – training data couldn’t cover terms sufficiently Pseudo feedback Using top M results returned by an external Web directory Mapping its category label to a category in the target taxonomy General label confidence – Meaning the number of returned related search results of whose category labels are after mapping 11
12
Copyright 2010 by CEBT Local Features (contd.) Implicit feedback Similar to Pseudo feedback, but using click information click-based label confidence score Calculating 1.Using Web Directory, get corresponding categories 2.Obtain a document collection for each possible query 3.Build a Vector Space Model for each category 4.Use cosine Similarity term vector of and snippets of the 12
13
Copyright 2010 by CEBT Contextual Features Direct Association between adjacent labels Using occurrence of a pair of labels The Higher the weight, the larger the probability transits into Taxonomy-based association between adjacent labels Limited by size of training data, some transition may not occur. Using Structure of Taxonomy The association between two sibling categories stronger than that of two non-sibling categories 13
14
Copyright 2010 by CEBT Experimental Setup Taxonomy of ACM KDD Cup’05 Target Taxonomy 7 level-one category 67 level-two category Data set Extracting 10,000 sessions from one day’s search log Each session contains at least two queries Three human labelers label the queries of each session 14
15
Copyright 2010 by CEBT Baseline Bridging classifier (BC) Training a classifier on an intermediate taxonomy Bridging the queries and the target taxonomy in the online step of QC Outperforming the winning approach in KDD Cup’ 05 Collaborating classifier (CC) Naïve context-aware approach Define score function of query q and category c by BC Using current query and past query, association of previous category and estimated category 15
16
Copyright 2010 by CEBT Evaluation For a test query, true category label Given the classification results is a set of the top K predicted category labels Recall Precision F 1 Score 16
17
Copyright 2010 by CEBT Results CRF-B: CRF with Basic Features – Query terms, General label confidence and Direct association between adjacent labels CRF-B-C: CRF-B + Click-based label confidence CRF-B-C-T: CRF-B-C + Taxonomy-based association 17 The average overall recall
18
Copyright 2010 by CEBT Results (contd.) The average overall F 1 score 18 The average overall precision
19
Copyright 2010 by CEBT Case Study Without considering context, Many possible search intents – General information of Santa Fe => Information\Local & Regional – Travel information of Santa Fe => Living\Travel & Vacation 19
20
Copyright 2010 by CEBT Conclusions Novel Approach for leveraging context information to classify queries by modeling search through CRFs This approach consistently outperforms a non-context-aware baseline and a naïve context-aware baselines The effectiveness of context information 20
21
Copyright 2010 by CEBT Discussions Experiments on real data set clearly show that this approach outperforms non-context-aware baseline The first-query problem Not being able to find a search context if query is located at the beginning of the session Experiments are too simple size of session height of taxonomy 21
22
Q & A Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.