1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG.

Slides:

Advertisements

Similar presentations

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

Advertisements

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.

Knowledge Base Completion via Search-Based Question Answering

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 7: Scores in a Complete Search.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.

Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

1 Statistical source expansion for question answering CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.

SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Using the Web for Language Independent Spellchecking and Auto correction Authors: C. Whitelaw, B. Hutchinson, G. Chung, and G. Ellis Google Inc. Published.

Web Intelligence and Intelligent Agent Technology 2008.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

User Modeling for Personal Assistant

Neighborhood - based Tag Prediction

Enriching Taxonomies With Functional Domain Knowledge

Clinically Significant Information Extraction from Radiology Reports

WSExpress: A QoS-Aware Search Engine for Web Services

Presentation transcript:

1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG

Outline Introduction Generating new queries – Aggregation into query templates – Generation of candidate phrase fillers – Filtering of candidate phrase fillers Experiment Conclusion 2

Introduction Introduces a method for automatically inferring meaningful, not-yet-submitted queries. 3 Known queries Others’ query log recommendation If the query isn’t in the known queries set ?

Introduction Motivation – enrich the sets of queries displayed as potential completions based on what Web search users have typed so far, reducing the time to search result. – Benefit for natural-language QA system 4

Introduction 5 Q1 Q2 Q3. query “lyrics of hey jude beatles” “lyrics of come together beatles” “lyrics of yesterday beatles” Query template lyrics of beatles :{yesterday, hey jude, come together} Known phrase Expand and filtering

Generating new queries Definitions: – A query Q is a sequence of tokens Q=[q 1,q 2 …,q i,...q NQ ], submitted as a search query by Web users. – A query template T is a generalization of a set of queries that share a common prefix and common postfix. – A template signature T S is a generalization of a set of query templates that, if their token sequences are reduced to token sets, become the same. Some tokens that belong to a fixed, global set of tokens deemed irrelevant may be discarded in the process. 6

Generating new queries Q a = “lyrics of hey jude beatles” q 1 q 2 q 3 q 4 q 5a prefix, non-empty infix, postfix lyrics, of hey jude, beatles lyrics of, hey jude, beatles lyrics of, hey, jude beatles. 7

Aggregation into query templates “lyrics of hey jude beatles” “lyrics of come together beatles” “lyrics of yesterday beatles” 8 lyrics of beatles :{yesterday, hey jude, come together} Query template Known phrase fillers

Generation of candidate phrase fillers Let DS(K i ) be the list of most distributionally similar phrases of a known phrase filler K i from a query template T. Any phrase U from DS(K i ) is considered as a candidate phrase filler U of the respective query template. The score of U relative to the entire set of known fillers 9

Generation of candidate phrase fillers 10 lyrics of beatles :{yesterday, hey jude, come together} DS(K 1 ) U 1 U 2 U 3. DS(K 2 ) U 1 U 3 U 4. DS(K 3 ) U 2 U 3 U 4. DS(K i ) = { U 1,U 2,U 3,U 4 } Query template : T Candidate phrase filler Sim(U 1,T) = ( DSscore(U 1,K 1 )+DSscore(U 1,K 2 )+DSscore(U 1,K 3 ) ) / 3 Sim(U 2,T) = ( DSscore(U 2,K 1 )+DSscore(U 2,K 2 )+DSscore(U 2,K 3 ) ) / 3 Sim(U 3,T) = ( DSscore(U 3,K 1 )+DSscore(U 3,K 2 )+DSscore(U 3,K 3 ) ) / 3 For each template T, its candidate phrase ﬁllers U are ranked in decreasing order of their scores.

11 Dsscore  similarity function  Cosine similarity Sim(U 1,T) = ( Cos(U 1,K 1 )+Cos(U 1,K 2 )+Cos(U 1,K 3 ) ) / 3 Sim(U 2,T) = ( Cos(U 2,K 1 )+Cos(U 2,K 2 )+Cos(U 2,K 3 ) ) / 3 Sim(U 3,T) = ( Cos(U 3,K 1 )+Cos(U 3,K 2 )+Cos(U 3,K 3 ) ) / 3 DS(K i ) word Feature 1 : context Feature 2 : context Feature 3 : context. Pair similarity(word i, word j ) Feature vector Top-K word i word1. word k Feature clusteringSimilar list

Filtering of candidate phrase fillers A canonical, keyword-based template signature T S tokens – Stopwords – POS tagging prepositions (of, for) determiners (the, an) auxiliary verbs (have, were) typical initial words in questions (who, how, where). The tokens not discarded are stemmed and ordered lexicographically in T S. 12

Filtering of candidate phrase fillers 13 lyrics of beatles beatl lyric lyrics the beatles beatles lyrics lyrics for by the beatles by beatles lyrics lyrics by the beatles

Filtering of candidate phrase fillers The list of candidate phrases of T is ﬁltered, by retaining only known phrases of some other templates T that share the same template signature T S as T. 14

Filtering of candidate phrase fillers something, earlier today, last friday, Gather together, eleanor rigby, two weeks ago, lucy in the sky with diamonds, strawberry ﬁelds forever, something else, here comes the sun, lovely rita. 15 Candidate phrase fillers of “lyrics of beatles :” Unfiltered list “lyrics the beatles ” eleanor rigby lucy in the sky with diamonds “beatles lyrics ” lovely rita here comes the sun Filtered list eleanor rigby lucy in the sky with diamonds lovely rita here comes the sun

Experiment Data Sources: The experiments rely on a random sample of around 100 million fully-anonymized queries in English submitted by Web users to Google in A phrase similarity repository is available. – Derived from unstructured text available within a sample of around 200 million documents in English. – The repository provides data for each of around 1 million phrases that occur as full-length queries in the input query logs. – It contains ranked lists of the top 200 phrases computed to be the most distributionally similar 16

Experiment Relative Coverage for Inferred Queries: – Left graph : the distribution of unique queries over query length in the input query logs. – Middle graph : the distribution of original queries that contribute to the creation of any query template and one of its known ﬁllers. – Right graph : is for inferred ﬁltered queries, which do not include any of the original queries among them. 17

Experiment Relative Coverage for Inferred Phrase Fillers: – the coverage over all query templates, computed as the percentage of known ﬁllers that occur among inferred ﬁllers. 18

Experiment a random sample of 800 TREC questions is manually inspected. – resulting question templates would be so general What is a fuel cell? (158) – If the question is too specific to generate any new queries from its possible templates In what year did joe dimaggio compile his 56-game hitting streak? (139) – Other questions A total of 503 of the 800 questions are thus converted into 457 unique question templates. 19

Experiment 20

Experiment Quality of Inferred Queries – Each phrase is manually assigned a correctness label within its respective query template Correct : it is a meaningful filler of the template Okay : it refers to a concept that would be a meaningful filler but as it stands grammatically disagrees with the template Misspelled : it is a meaningful filler but is misspelled Wrong : it is not a meaningful filler. 21

Experiment 22

Conclusion Redundancy within Web search queries allows for the generation of not-yet-submitted queries that correspond to meaningful user information needs. The analysis of query logs is limited to queries considered in isolation from one another; no search-result clicks or query sessions are needed. 23