Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Date ： 2014/12/04 Author ： Parikshit Sondhi, ChengXiang Zhai Source ： CIKM’14 Advisor ： Jia-ling Koh Speaker ： Sz-Han,Wang.

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date ： 2014/04/15 Source ： KDD’13 Authors ： Chi Wang, Marina Danilevsky, Nihit.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Time-sensitive Personalized Query Auto-Completion

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

ISP 433/533 Week 2 IR Models.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.

Search Engines and Information Retrieval Chapter 1.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.

RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:

EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,

LOGO 1 Corroborate and Learn Facts from the Web Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Shubin Zhao, Jonathan Betz (KDD '07 )

1 Blog site search using resource selection 2008 ACM CIKM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Presented By Amarjit Datta

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.

Search Result Diversification in Resource Selection for Federated Search Date ： 2014/06/17 Author ： Dzung Hong, Luo Si Source ： SIGIR’13 Advisor: Jia-ling.

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Text Based Information Retrieval

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Probabilistic Ranking of Database Query Results

Probabilistic Information Retrieval

WSExpress: A QoS-Aware Search Engine for Web Services

Presentation transcript:

Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness of Keyword Queries on Databases

Outline Introduction Ranking robustness principle A framework to measure structure robustness Experiments Conclusion 2

Databases contain entities, and entities contain attributes that take attribute value. Some of the difficulties of answering a query: User do not normally specify the desired schema element for each query term. Ex : Godfather=>title, company The schema of the output is not specified. Ex : Godfather=>movies, actor Introduction 3

Researchers have proposed some methods to detect difficult queries over plain text document. In this paper, it analyzes the characteristics of difficult queries over databases and propose a novel method to detect such queries. Introduction 4

Model a database as a set of entity sets. Entity set (S): a collection of entities E Entity (E): a set of attribute values A i Attribute value (A): belongs to an attribute T (A є T) Definition EX: movie : entity title : attribute Godfather : attribute value 5

Query (Q) : a set of terms Q = { q 1, q 2, ….q |Q| } Entity (E) is an answer to Q iff at least one of its attribute values contains a term q i in Q Given data base DB and query Q Retrieval function g(E, Q, DB) Ranked list L(Q, g, DB) Definition 6

Outline Introduction Ranking robustness principle A framework to measure structure robustness Experiments Conclusion 7

If a text retrieval method effectively ranks the answers to a query in a collection of text documents, it will also perform well for that query over the version of the collection that contains some errors such as repeated terms. A query to be more difficult if its rankings over the original and the corrupted versions of the data are less similar. Use Ranking Robustness Principle as a domain independent proxy metric to measure the degree of the difficulties of queries. Ranking Robustness Principle 8

The more diverse the candidate answers of a query are, the more difficult the query is. EX : more entities match the term in a query EX : each attribute describes a different aspect of an entity Ranking Robustness Principle 9

Term database appear in DB1 and DB2 EX : book =>150, article => 100, movie=>10 Much harder to decide the desired entity set in DB2. Take into account the distributions of the query term in the database as well. Ranking Robustness Principle DB1DB2 book movie article 10

Outline Introduction Ranking robustness principle A framework to measure structure robustness Experiments Conclusion 11

A corrupted version of DB can be seen as a random sample of such a probabilistic model. Model database DB as a triplet( S, T, A) V : the number of distinct terms in database DB Attribute value A a : X a = (X a,1, …., X a,v ) X a,j є X a is a random variable that represents the frequency of term w j in A a Probability mass function of X a : Structure robustness 12

Attribute value set A : X A = (X 1, …., X |A| ) X a єX A is a vector of size V that denotes the frequencies of terms in A a Probability mass function of X A : The domain of x is all |A|xV matrices : M(|A| x V) Similarly define X T and X s that model the set of attributes T and the set of entity sets S. Structure robustness 13

Structured Robustness score(SR score) g : retrieval function L : ranked list f X DB : noise generation model for database DB Sim : the Spearman rank correlation between the ranked answer lists Structured robustness calculation 14

Define the noise generation model f X DB (M) for database DB. Attribute value is corrupted by a combination of three corruption levels The value itself Its attribute Its entity set Noise generation 15

The noise generation model attribute value A : T t : attribute S s : entity set f Y t,j ( Y s,j ) : the probability of w j to appear y t,j times in T t f Z s,j ( Z s,j ) : the probability of w j to appear Z t,j times in S s 0≤γ A γ T, γ S ≤1 and γ A + γ T + γ S = 1 Noise generation 16

The attribute value A a is a small document, we mode f X i,j as a Poission distribution: λa,j : the frequency of j in attribute value A a Similarly, model the attribute T t and entity set S s: λt,j : the frquency of j in attribute T t λs,j : the frquency of j in entity set S s Noise generation 17

A mixture model overestimate the frequency of the terms that are relatively frequent in an attribute or entity set. Noise generation model: Noise generation 18

EX : Query term t : ancient Attribute T j : plot γ T = 0.9 Ancient occurs in attribute plot: 2132 times total number of attribute values under plot : => λ t,j = 2132 / = => Probability that term t occurs k times in a corrupted plot attribute : f Y t,j (x t, j ) = Noise generation 19

Top-k result Number of corruption iterations(N) Algorithm 20

Q11: difficult Noise generation 21

Q9:easy Noise generation 22

Outline Introduction Ranking robustness principle A framework to measure structure robustness Experiments Conclusion 23

Data set : INEX, Semantic Search Data file : select 20 Characteristics: Experiment 24

Ranking algorithm : PRMS Employs a language model approach Top-k : k = 10, k = 20 γ A, γ T, γ S : train(γ A, γ T, γ S ) by 5-fold cross validation INEX : (1, 0.9, 0.8 ) SemSearch : (1, 0.1, 0.6 ) Experiment 25

Experiment Q9 : mulan hua animation Q11 : ancient rome era 26

Experiment Q78 : sharp-pc Q90 : university of phoenix Q19 : carl lewis 27

Experiment 28

Outline Introduction Ranking robustness principle A framework to measure structure robustness Experiments Conclusion 29

Introduce the problem of prediction the degree of the difficulty for queried over databases. Set forth a principled framework and proposed novel algorithms to measure the degree of the difficulty of a query over a database. 30 Conclusion