Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Information Retrieval in Practice
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why?  Attributes are used to extract templates which.
Introduction to Machine Learning Approach Lecture 5.
Overview of Search Engines
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Post-Ranking query suggestion by diversifying search Chao Wang.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Single Document Key phrase Extraction Using Neighborhood Knowledge.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Information Retrieval in Practice
Information Organization: Overview
Applying Key Phrase Extraction to aid Invalidity Search
Cross-library API Recommendation Using Web Search Engines
Web Information retrieval (Web IR)
Information Organization: Overview
Topic: Semantic Text Mining
Introduction to Search Engines
Presentation transcript:

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007

Motivation  Name entities  essential during the construction of knowledge bases from Web  helpful in various NLP tasks; like parsing, coreference resolution …  constitute a significant part of the Web search queries  helpful in building verticals in Web search

Previous works  Mining query logs  to improve various IR tasks re-ranking of retrieved documents query expansion spelling correction  Large-scale IE  mainly on document collections  ignoring the collective knowledge embedded in noisy search queries  This is the first work that applies name entity finding to Web search query logs

Extraction from Query Logs  Given  A set of target classes  A set of seed instances  The goal  To extract relevant class instances from query logs  Without using Any domain knowledge Any handcrafted extraction pattern

Overview of the System

Step 1: identification of query templates that match the seed instances

Step 2: identification of candidate instances

Step 3: internal representation of candidate instances  query: prefix candidate_instance postfix  entry: prefix postfix  weight of an entry = frequency of the query

Step 4: internal representation of seed instances  introducing weak supervision in the extraction process  the vectors associated with the seed instance are merged into a reference search-signature vector  a loose search fingerprint of the desired output type with respect to the class

Step 5: instance ranking  ranking based on the similarity score (computed with Jensen-Shannon) between each candidate vector and class vector

Reference search-signature vectors  A series of queries that can be asked about instances of a class  Given a set of candidate phrases, the system guess which candidate phrases are more likely to belong to the target class by looking at the queries

Experimental setting - 1  Target Classes  10 classes with 5 seed instances for each class City Country Drug Food Location Movie Newspaper Person University VideoGame

Experimental setting - 2  Data  A random sample of 50 million unique fully-anonymized queries submitted to Google  Evaluation Procedure  Top 250 candidates of each class are manually assigned a correctness label 1 : correct 0 : incorrect  Precision at rank N has been calculated for several N values

Quality of Extracted Instances

Does the popularity of seed instances in query logs correlated with precision ? more queries with seed instances more accurate scoring better internal representation ?

Comparing the usefulness of query logs vs. Web documents in NE finding  M. Pasca. Acquisition of categorized named entities for Web search. CIKM 2004  Target classes are incrementally acquired from Web documents along with their respective instances by using hand crafted extraction patterns (D-patt) Class [such as|including] Instance  Manual one-to-one mapping of chosen target classes with acquired classes

Comparing the usefulness of query logs vs. Web documents in NE finding  Instances extracted from Web documents are also manually evaluated as correct and incorrect  Except City, Newspaper and Country classes, seed based extraction from queries outperformed D-patt in every other class

Conclusion  Search queries, which are thought as noisy, keyword based approximations of underspecified user information needs, proved to be useful in name entity discoveries even with a small set of seed instances  with absolute precision (or precision improvement relative to web based hand crafted system) 0.96 (29%) for 0.90 (26%) for 0.80 (15%) for

Questions ?