Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li Presentation by Gonçalo Simões Course: Recuperação de Informação SIGIR 2009.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia.
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Named Entity Mining From Click-Through Data Using Weakly Supervised LDA Gu Xu 1, Shuang-Hong Yang 1,2, Hang Li 1 1 Microsoft Research Asia, China 2 College.
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Yunhua Hu 1, Guomao Xin 2, Ruihua Song, Guoping Hu 3, Shuming.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
INFO 624 Week 3 Retrieval System Evaluation
Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Search Engines and Information Retrieval Chapter 1.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Multiple Instance Real Boosting with Aggregation Functions Hossein Hajimirsadeghi and Greg Mori School of Computing Science Simon Fraser University International.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
1 A Probabilistic Model for Bursty Topic Discovery in Microblogs Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Jun Xu, Xueqi Cheng CAS Key Laboratory of Web Data.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
A Unified and Discriminative Model for Query Refinement Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Is Top-k Sufficient for Ranking? Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Chinese Academy of Sciences, Beijing, China
Martin Rajman, Martin Vesely
Web Information retrieval (Web IR)
Information Retrieval and Web Design
Topic: Semantic Text Mining
Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu
Presentation transcript:

Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research Asia, China

Outline Problem Definition Potential Applications Challenges Our Approach Experimental Results Summary

Outline Problem Definition Potential Applications Challenges Our Approach Experimental Results Summary

Problem Definition Named Entity Recognition in Query (NERQ) Identify Named Entities in Query and Assign them into Predefined Categories with Probabilities Harry Potter Harry Potter Walkthrough MovieBookGame MovieBookGame

Outline Problem Definition Potential Applications Challenges Our Approach Experimental Results Summary

NERQ in Searching Structured Data Games Books Unstructured Queries Structured Databases (Instant Answers, Local Search Index, Advertisements and etc) NERQ Module Smarter Dispatch This query prefers the results from the “Games” database Better Ranking “harry potter” should be used as key to match the records in the database, and further ranked by “walkthrough” harry potter walkthrough Movies

NERQ in Web Search Search results can be better if we know that “21 movie” indicates searcher wants the movie named 21

Outline Problem Definition Potential Applications Challenges Our Approach Experimental Results Summary

Challenges NER (Named Entity Recognition) – Well formed documents (e.g. news articles) – Usually a supervised learning method based on a set of features Context Feature: whether “Mr.” occurs before the word Content Feature: whether the first letter of words is capitalized NERQ – Queries are short (2-3 words on average) Less context features – Queries are not well-formed (typos, lower cased, …) Less content features

Outline Problem Definition Motivation and Potential Applications Challenges Our Approach Experimental Results Summary

Our Approach to NERQ Goal of NERQ becomes to find the best triple (e, t, c)* for query q satisfying Harry Potter Walkthrough “Harry Potter” (Named Entity) + “# Walkthrough” (Context) te “Game” Class c q

Training With Topic Model Ideal Training Data T = {(e i, t i, c i )} Real Training Data T = {(e i, t i, * )} – Queries are ambiguous (harry potter, harry potter review) – Training data are a relatively few

Training With Topic Model (cont.) harry potter kung fu panda iron man …………………… …………………… harry potter kung fu panda iron man …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # is a placeholder for name entity. # means “harry potter” here Movie Game Book …………………… Movie Game Book …………………… Topics etc

Weakly Supervised Topic Model Introducing Supervisions – Supervisions are always better – Alignment between Implicit Topics and Explicit Classes Weak Supervisions – Label named entities rather than queries (doc. class labels) – Multiple class labels (Binary Indicator) Kung Fu Panda MovieGameBook ? ? Distribution Over Classes

Weakly Supervised LDA (WS-LDA) LDA + Soft Constraints (w.r.t. Supervisions) Soft Constraints LDA Probability Soft Constraints Document Probability on the i -th Class Document Probability on the i -th Class Document Binary Label on the i -th Class Document Binary Label on the i -th Class Topic

System Flow Chat OnlineOffline Set of named entities with labels Create a “context” document for each seed and train WS-LDA Contexts Find new named entities by using obtained contexts and estimate p(c|e) using WS-LDA and p(e) Entities Input Query Evaluate each possible triple (e, t, c) Results

Outline Problem Definition Motivation and Potential Applications Challenges Our Approach Experimental Results Summary

Experimental Results Data Set – Query log data Over 6 billion queries and 930 million unique queries About 12 million unique queries – Seed named entities 180 named entities labeled with four classes 120 named entities are for training and 60 for testing

Experimental Results (cont.) NERQ Precision

Experimental Results (cont.) Named Entity Retrieval and Ranking – class distribution Aggregation of seed context distributions (Pasca, WWW07) p(t |c) from WS-LDA model – q(t |e) as entity distribution – Jensen-Shannon similarity between p(t |c) and q(t |e)

Experimental Results (cont.) Comparison with LDA – Class Likelihood of e :

Outline Problem Definition Motivation and Potential Applications Challenges Our Approach Experimental Results Summary

We first proposed the problem of named entity recognition in query. We formulized the problem into a probabilistic problem that can be solved by topic model. We devised weakly supervised LDA to incorporate human supervisions into training. The experimental results indicate that the proposed approach can accurately perform NERQ, and outperforms other baseline methods.

THANKS!