Intent-Aware Semantic Query Annotation

Slides:



Advertisements
Similar presentations
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Advertisements

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
A Machine Learning Approach for Improved BM25 Retrieval
Lazy vs. Eager Learning Lazy vs. eager learning
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
Ensemble Learning: An Introduction
Induction of Decision Trees
Saehoon Kim§, Yuxiong He. , Seung-won Hwang§, Sameh Elnikety
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Ensemble Learning (2), Tree and Forest
Learning to Rank for Information Retrieval
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
A Language Independent Method for Question Classification COLING 2004.
Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CLASSIFICATION: Ensemble Methods
Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
National Taiwan University, Taiwan
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Collecting High Quality Overlapping Labels at Low Cost Grace Hui Yang Language Technologies Institute Carnegie Mellon University Anton Mityagin Krysta.
Learning to Rank From Pairwise Approach to Listwise Approach.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
Post-Ranking query suggestion by diversifying search Chao Wang.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
NTU & MSRA Ming-Feng Tsai
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Feature Selection Poonam Buch. 2 The Problem  The success of machine learning algorithms is usually dependent on the quality of data they operate on.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Antara Ghosh Jignashu Parikh
Click Through Rate Prediction for Local Search Results
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
Evaluation of IR Systems
An Empirical Study of Learning to Rank for Entity Search
Chapter 6 Classification and Prediction
Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo
Learning to Rank Shubhra kanti karmaker (Santu)
Accounting for the relative importance of objects in image retrieval
Machine Learning: Lecture 3
Discriminative Frequent Pattern Analysis for Effective Classification
Intent-Aware Semantic Query Annotation
Feature Selection for Ranking
Dynamic Category Profiling for Text Filtering and Classification
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
INF 141: Information Retrieval
Learning to Rank with Ties
Using Link Information to Enhance Web Page Classification
Introduction Dataset search
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Intent-Aware Semantic Query Annotation —— SIGIR 17 Rafael Glater Rodrygo L. T. Santos Nivio Ziviani 黄子贤 2018.04.17

LTR: LambdaMART -> Lambda + GBDT(Gradient Boosting Decision Tree) Preliminary Metric: P@10 (Precision) MAP (Mean Average Precision) NDCG (Normalization Discounted Cumulative Gain) state-of-the-art: Entity Search: FSDM (Fielded Sequential Dependence Model ) LTR: LambdaMART -> Lambda + GBDT(Gradient Boosting Decision Tree)

Research Motivation Reason: Improving the understanding of a query Annotating query with semantic information mined from a knowledge base Reason: Over 70% of all queries contain a semantic resource Almost 60% have a semantic resource as their primary target

Research Motivation By a single entity Query: ben franklin <dbpedia:Ben_Franklin_(PX-15)> <dbpedia:Benjamin_Franklin> By a list of entities of a single type Query:US presidents since 1960 <dbpedia:Bill_Clinton> <dbpedia:George_H._W._Bush> By entity attribute Query: England football player highest paid By entity related Query: U.S. president authorise nuclear weapons against Japan

Technical Contributions Four intent-speci€c query sets E: entity queries(e.g., “Orlando ƒFlorida”) T: type queries (e.g., “continents in the world”) Q: question queries (e.g., “who created Wikipedia?”) O: queries with other intents, including less represented ones, such as relation queries and attribute queries. Core Hypothesis Different queries may benefit from a ranking model optimized to their intent.

Technical Contributions

†Query Intent Classification Lexical features Semantic features Lexical features natural language queries usually longer than others POS tags can help identify question queries, indicating the presence of wh-pronouns seeking for a specific entity probably return fewer categories or ontology classes than seeking for a list of entities “Eiffel” returns only 5 categories “list of €films from the surrealist category” returns more than 103,000.

Intent-Speci€fic Learning to Rank Content-Based Semantic features derived from KG query-independent Algorithm: LambdaMART input space : 𝑅 𝑗 is produced using BM25 output space : provides relevance labels for each semantic resource r ∈ 𝑅 𝑗

Intent-Speci€c Learning to Rank Entity Document Three other €fields: Ontology classes URL ALL:concatenating the available content from all €fields

Intent-Aware Ranking Adaptation Two Strategy 1、intent-aware switching For instance: 𝑖 1 is predicted as the most likely for q P( 𝑖 1 |q)=1 , P( 𝑖 2 |q)=0 , P( 𝑖 3 |q)=0 P(r | q)= P(r | q, 𝑖 1 ) 2、intent-aware mixing For instance: P( 𝑖 1 |q)=0.7 , P( 𝑖 2 |q)=0.2 , P( 𝑖 3 |q)=0.1

Experimental setup perform a 5-fold cross validation 60 queries for training, 20 queries for validation, 20 queries for testing. All results are reported as averages of all test queries across the average cross-validation rounds

Experimental results Intent Specificity Q1: Do different intents benefit from different ranking models? Top 5 features per ranking model. Spearman’s correlation coefficient for feature importance Feature importance evaluation 1() is the indicator function 𝑛 𝑙 ( 𝑛 𝑟 ) is the number of instances in the left (right) child of the splitting node n 𝑦 𝑙 ( 𝑦 𝑟 ) is the mean value assumed by the relevance label in the left (right) child of n.

Experimental results Intent Classification Accuracy Q2: How accurately can we predict the intent of each query? Semantic query annotation robustness for simulated intent classifiers of a range of accuracy levels query intent classification accuracy

Experimental results Annotation Effectiveness Q3. How effective is our semantic query annotation approach?

Experimental results Effectiveness breakdown by query intend Differences in nDCG@100 between LambdaMART (mixing) and LambdaMART (oblivious) across

Experimental results Effectiveness breakdown by query length Effectiveness breakdown by query difficultys

Conclusions contributions An intent-aware framework for learning semantic query annotations from structured knowledge bases. An analysis of the specificity of several content and structural features for different query intents A thorough validation of the proposed framework in terms of annotation effectiveness and robustness Core Hypothesis Different queries may benefit from a ranking model optimized to their intent. Future work FSDM can be improved with an intent-aware approach to hyperparameter tuning