Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,

Slides:

Advertisements

Similar presentations

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.

Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.

Sentiment Diversification with Different Biases Date ： 2014/04/29 Source ： SIGIR’13 Advisor ： Prof. Jia-Ling, Koh Speaker ： Wei, Chang 1.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Personalized Search Result Diversification via Structured Learning

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.

Integrating Multiple Resources for Diversified Query Expansion Arbi Bouchoucha, Xiaohua Liu, and Jian-Yun Nie Dept. of Computer Science and Operations.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Effective Query Formulation with Multiple Information Sources

RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

1 Blog site search using resource selection 2008 ACM CIKM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,

Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Search Result Diversification in Resource Selection for Federated Search Date ： 2014/06/17 Author ： Dzung Hong, Luo Si Source ： SIGIR’13 Advisor: Jia-ling.

1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.

Improving Search Relevance for Short Queries in Community Question Answering Date： 2014/09/25 Author ： Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.

Compact Query Term Selection Using Topically Related Text

Applying Key Phrase Extraction to aid Invalidity Search

Measuring the Latency of Depression Detection in Social Media

Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Connecting the Dots Between News Article

Ranking using Multiple Document Types in Desktop Search

Preference Based Evaluation Measures for Novelty and Diversity

Presentation transcript:

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN, CHENG

Outline Introduction Topic Level Diversification Term Level Diversification Experiment Conclusions

“ ” Introduction Diversification to produce a more diverse ranked list with respect to some set of topics or aspects associated with this query.

Goal: whether diversification with respect to these topics benefits from the additional structure or grouping of terms or would diversification using the topic terms directly be just as effective? Introduction

Outline Introduction Topic Level Diversification Term Level Diversification Experiment Conclusions

Topic Level Diversification D1 D2 D3 D4 D5 D6 D7 D8.... Search result list T1 : w1 T2 : w2 T3 : w3. tn : wn Topic set P(d1|t1) P(d1|tn) Select subset of size k D1 D3 D4 D8 Diversity by Redundancy : xQuAD Diversity by Proportionality : PM-2

Ex of xQuAD: d1T t20.2 d2T t20.1 d3T t20.3 d4T t20.1 d5T t20.3 W10.8 w20.2 S=Ø,λ=0.6 1 st iteration D1=0.4* [0.8*0.3*1+0.2*0.2*1] = [ ] D2=0.4* [0.8*0.3*1+0.2*0.1*1] = [ ] D3=0.4* [0.8*0.1*1+0.2*0.3*1] = [ ] D4=0.4* [0.8*0.2*1+0.2*0.1*1] = [ ] D5=0.4* [0.8*0.1*1+0.2*0.3*1] = [ ] S={d1} 2 nd iteration D2=0.4* [0.8*0.3*(1-0.3)+0.2*0.1*(1-0.2)] = [0.24* *0.8]= D3=0.4* [0.8*0.1* *0.3*0.8] = [0.08* *0.8]= D4=0.4* [0.8*0.2* *0.1*0.8] = [0.16* *0.8]= D5=0.4* [0.8*0.1* *0.3*0.8] = [0.08* *0.8]= S={d1,d2}

d1T t20.2 d2T t20.1 d3T t20.3 d4T t20.1 d5T t20.3 W10.8 w rd iteration D3= [0.08*0.7* *0.8*0.9] = [0.08* *0.72]= D4= [0.16*0.7* *0.8*0.9] = [0.16* *0.72]= D5= [0.08*0.7* *0.8*0.9] = [0.08* *0.72]= S={d1,d2,d3} 4 th iteration D4= [0.16*0.7*0.7* *0.8*0.9*0.7] = [0.16* *0.504]= D5= [0.08*0.7*0.7* *0.8*0.9*0.7] = [0.08* *0.504]= S={d1,d2,d3,d4}

d1T10.3 t20.2 d2T10.3 t20.1 d3T10.1 t20.3 d4T10.2 t20.1 d5T10.1 t20.3 W10.8 w20.2 Ex of PM-2 For 1 st position qt[1] = w1/(2*0+1)=0.8 qt[2] = w2/(2*0+1)=0.2 s1=0,s2=0 D1= 0.6*0.8* [0.2*0.2] = =0.16 D2= 0.6*0.8* [0.2*0.1] = =0.152 D3= 0.6*0.8* [0.2*0.3] = =0.072 D4= 0.6*0.8* [0.2*0.1] = = D5= 0.6*0.8* [0.2*0.3] = =0.072 S1=0+(0.3/( ))=0.6 S2=0+(0.2/( ))=0.4 For 2 nd position s1=0.6,s2=0.4 qt[1] = 0.8/(2*0.6+1)=0.36 qt[2] = 0.2/(2*0.4+1)=0.11 D2= 0.6*0.36* [0.11*0.1] = = D3= 0.6*0.36* [0.11*0.3] = = D4= 0.6*0.36* [0.11*0.1] = = D5= 0.6*0.36* [0.11*0.3] = = S1=0.6+(0.3/( ))=1.35 S2=0.4+(0.1/( ))=0.65 For 3 rd position s1=1.35,s2=0.65 qt[1] = qt[2] = D3= 0.6*0.217* [0.087*0.3] = D4= 0.6*0.217* [0.087*0.1] = D5= 0.6*0.217* [0.087*0.3] = S1=1.35+(0.2/( ))=2.01 S2=0.65+(0.1/( ))=0.98 For 4 th position qt[1] = 0.159qt[2] = D5= 0.6*0.159* [0.068*0.3] = D3= 0.6*0.159* [0.068*0.3] =0.0177

Outline Introduction Topic Level Diversification Term Level Diversification Experiment Conclusions

Term Level Diversification More relevant based on the assumption that if a document is more relevant to one topic than another, it is also more relevant to the terms associated with this topic than any of the terms from the other topic.

Term Level Diversification Instead of diversifying R using the set of topics T, we propose to perform diversification using T′, treating each as a topic. Ex: Topic level ： t = { treat joint pain ， woodwork joint type} Term level ： T’ = {treat ， joint ， pain ， woodwork ， type}

Term Level Diversification

Vocabulary( V ) Identification: 1. appear in at least two documents 2. at least two characters 3. not numbers two types of terms: unigrams and phrases Topic Terms( T ) Identification: All vocabulary terms that co-occur with any of the query terms within a proximity window of size w

Term Level Diversification Topicality and Predictiveness: topicality : how informative it is at describing the set of documents Predictiveness : how much the occurrence of a term predicts the occurrences of others

Ex of DSP Algo. Q={} V={v1,v3,v4,v6,v7,v9} T={v2,v5,v8} 1 st iteration TP(t1)= 0.1log(0.1/0.1) = 0 PR(t1)=(1/5)*( ) =0.2*1.7=0.34 => TP(1)*PR(t1)=0 Ct1={v1,v3,v4,v7,v9} Ct2={v4,v7,v9} Ct3={v6} TP(t2)=0.7log(0.7/0.6) = PR(t2)=(1/3)*( ) =0.33*2.1=0.693 => 0.047*0.693= TP(t3)=0.2log(0.2/0.3) = PR(t3)=1*0.6=0.6 =>-0.035*0.6= P(t1|q)0.1 P(t2|q)0.7 P(t3|q)0.2 Pc(t1)0.1 Pc(t2)0.6 Pc(t3)0.3 Pw(t1|v1)0.1 Pw(t1|v3)0.5 Pw(t1|v4)0.2 Pw(t1|v7)0.8 Pw(t1|v9)0.1 Pw(t2|v4)0.7 Pw(t2|v7)0.5 Pw(t2|v9)0.9 Pw(t3|v6)0.6 => t*=t1 DTT=Ø ， PRED=Ø DTT={t2} pred={v4,v7,v9} PR(t1)= Pw(t1|v4)-Pw(t1|v7) -Pw(t1|v9) = = PR(t3)= 0.6 -Pw(t3|v4)-Pw(t3|v7)-Pw(t3|v9) = Update PR(ti): PRED={v4,v7,v9}

2 nd iteration TP(t1)= 0.1log(0.1/0.1) = 0 PR(t1)= => TP(t1)*PR(t1)=0 TP(t3)=0.2log(0.2/0.3) = PR(t3)= 0.6 =>TP(t3)*PR(t3)= => t*=t1 DTT={t1,t2}PRED={v4,v7,v9} pred={v1,v3,v4,v7,v9} Pw(t1|v1)0.1 Pw(t1|v3)0.5 Pw(t1|v4)0.2 Pw(t1|v7)0.8 Pw(t1|v9)0.1 Pw(t2|v4)0.7 Pw(t2|v7)0.5 Pw(t2|v9)0.9 Pw(t3|v6)0.6 PR(t3)= 0.6 -Pw(t3|v1)-Pw(t3|v1) =0.6 Update PR(ti): PRED={v1,v3,v4,v7,v9} 3 rd iteration DTT={t1,t2,t3}

Outline Introduction Topic Level Diversification Term Level Diversification Experiment Conclusions

Experiments Experimental setup: Query set: 147 queries with relevance judgments from three years of the TRECWeb Track’s diversity task(2009,2010,2011) Retrieval Collection: ClueWeb09 Category B retrieval collection Evaluation Metric diversity tasks : α-NDCG 、 ERR-IA 、 NRBP 、 Precision-IA 、 subtopic recall relevance-based : NDCG 、 ERR evaluated at the top 20 documents. Parameter Settings K = 50 ， topic and term extraction techniques operate on these top 50 documents.

Experiments Q: statistically significant differences to QL W/L : with respect to α-NDCG This suggests that existing diversification frameworks are capable of returning relevant documents for topics without the explicit topical grouping

Experiments Q, M, L, K : statistically significant differences to QL, MMR, LDA and KNN respectively. Bold face : the best performance in each group. Unigrams : improving more queries and hurting fewer Phrases : retrieve more relevant results DSPApprox has slightly lower subtopic recall compared to query-likelihood (QL), the difference is not significant. (97% overlap).

Experiments

Conclusions  Introduces a new approach to topical diversification: diversification at the term level.  Existing work models a set of aspects for a query, where each aspect is a coherent group of terms. Instead, we propose to model the topic terms directly.  Experiments indicate that the topical grouping provides little benefit to diversification compared to the presence of the terms themselves.  Effectively reduces the task of finding a set of query topics, which has proven difficult, into finding a simple set of terms.