Intent-Aware Semantic Query Annotation —— SIGIR 17 Rafael Glater Rodrygo L. T. Santos Nivio Ziviani 黄子贤 2018.04.17
LTR: LambdaMART -> Lambda + GBDT(Gradient Boosting Decision Tree) Preliminary Metric: P@10 (Precision) MAP (Mean Average Precision) NDCG (Normalization Discounted Cumulative Gain) state-of-the-art: Entity Search: FSDM (Fielded Sequential Dependence Model ) LTR: LambdaMART -> Lambda + GBDT(Gradient Boosting Decision Tree)
Research Motivation Reason: Improving the understanding of a query Annotating query with semantic information mined from a knowledge base Reason: Over 70% of all queries contain a semantic resource Almost 60% have a semantic resource as their primary target
Research Motivation By a single entity Query: ben franklin <dbpedia:Ben_Franklin_(PX-15)> <dbpedia:Benjamin_Franklin> By a list of entities of a single type Query:US presidents since 1960 <dbpedia:Bill_Clinton> <dbpedia:George_H._W._Bush> By entity attribute Query: England football player highest paid By entity related Query: U.S. president authorise nuclear weapons against Japan
Technical Contributions Four intent-specic query sets E: entity queries(e.g., “Orlando Florida”) T: type queries (e.g., “continents in the world”) Q: question queries (e.g., “who created Wikipedia?”) O: queries with other intents, including less represented ones, such as relation queries and attribute queries. Core Hypothesis Different queries may benefit from a ranking model optimized to their intent.
Technical Contributions
Query Intent Classification Lexical features Semantic features Lexical features natural language queries usually longer than others POS tags can help identify question queries, indicating the presence of wh-pronouns seeking for a specific entity probably return fewer categories or ontology classes than seeking for a list of entities “Eiffel” returns only 5 categories “list of films from the surrealist category” returns more than 103,000.
Intent-Specific Learning to Rank Content-Based Semantic features derived from KG query-independent Algorithm: LambdaMART input space : 𝑅 𝑗 is produced using BM25 output space : provides relevance labels for each semantic resource r ∈ 𝑅 𝑗
Intent-Specic Learning to Rank Entity Document Three other fields: Ontology classes URL ALL:concatenating the available content from all fields
Intent-Aware Ranking Adaptation Two Strategy 1、intent-aware switching For instance: 𝑖 1 is predicted as the most likely for q P( 𝑖 1 |q)=1 , P( 𝑖 2 |q)=0 , P( 𝑖 3 |q)=0 P(r | q)= P(r | q, 𝑖 1 ) 2、intent-aware mixing For instance: P( 𝑖 1 |q)=0.7 , P( 𝑖 2 |q)=0.2 , P( 𝑖 3 |q)=0.1
Experimental setup perform a 5-fold cross validation 60 queries for training, 20 queries for validation, 20 queries for testing. All results are reported as averages of all test queries across the average cross-validation rounds
Experimental results Intent Specificity Q1: Do different intents benefit from different ranking models? Top 5 features per ranking model. Spearman’s correlation coefficient for feature importance Feature importance evaluation 1() is the indicator function 𝑛 𝑙 ( 𝑛 𝑟 ) is the number of instances in the left (right) child of the splitting node n 𝑦 𝑙 ( 𝑦 𝑟 ) is the mean value assumed by the relevance label in the left (right) child of n.
Experimental results Intent Classification Accuracy Q2: How accurately can we predict the intent of each query? Semantic query annotation robustness for simulated intent classifiers of a range of accuracy levels query intent classification accuracy
Experimental results Annotation Effectiveness Q3. How effective is our semantic query annotation approach?
Experimental results Effectiveness breakdown by query intend Differences in nDCG@100 between LambdaMART (mixing) and LambdaMART (oblivious) across
Experimental results Effectiveness breakdown by query length Effectiveness breakdown by query difficultys
Conclusions contributions An intent-aware framework for learning semantic query annotations from structured knowledge bases. An analysis of the specificity of several content and structural features for different query intents A thorough validation of the proposed framework in terms of annotation effectiveness and robustness Core Hypothesis Different queries may benefit from a ranking model optimized to their intent. Future work FSDM can be improved with an intent-aware approach to hyperparameter tuning