A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Slides:

Advertisements

Similar presentations

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Advertisements

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Optimizing search engines using clickthrough data

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

Evaluating Search Engine

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Formal Multinomial and Multiple- Bernoulli Language Models Don Metzler.

CHAPTER 6 Statistical Analysis of Experimental Data

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.

Modern Retrieval Evaluations Hongning Wang

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Post-Ranking query suggestion by diversifying search Chao Wang.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Modern Retrieval Evaluations Hongning Wang

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.

Accurately Interpreting Clickthrough Data as Implicit Feedback

An Empirical Study of Learning to Rank for Entity Search

Clustering Evaluation The EM Algorithm

Roberto Battiti, Mauro Brunato

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Learning to Rank with Ties

Presentation transcript:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil University, South Korea) Soyeon Park (Duksung Women’s University, South Korea) SIGIR 2006

INTRODUCTION Many web service providers keep non-textual information related to their document collections such as click-through counts, or user recommendations This non-textual information has great potential for improving search quality – Ex: In the case of the homepage finding, link information has proved to be very helpful

INTRODUCTION For our experiment, we choose a community based question answering service where users ask and answer questions to help each other These services typically search their collection of question and answer (Q&A) pairs to see if the same question has previously been asked

INTRODUCTION In the retrieval of Q&A pairs, estimating the quality of answers is important because some questions have bad answers – Ex: – Q: What is the minimum positive real number in Matlab? – A: Your IQ. – Q: What is new in Java2.0? – A: Nothing new.

INTRODUCTION We use kernel density estimation and the maximum entropy approach to handle various types of non-textual features and build a process that can predict the quality of documents associated with the features In order to test whether quality prediction can improve the retrieval results, we incorporate our quality measure into the query likelihood retrieval model

DATA COLLECTION We collected 6.8 million Q&A pairs from the Naver Q&A service We randomly selected 125 queries from the search log and ran 6 different search engines to gather the top 20 Q&A pairs from each search result Annotators manually judged the candidates in three levels: Bad, Medium and Good

DATA COLLECTION Annotators read the question part of the Q&A pair. If the question part addressed the same information need as the query, then the Q&A pair was judged as relevant. When the information need of a query was not clear, annotators looked up click-through logs of the query and guessed the intent of the user. In all, we found 1,700 relevant Q&A pairs.

Manual Judgment of Answer Quality and Relevance In general, good answers tend to be relevant, informative, objective, sincere and readable Our annotators read answers, consider all of the above factors and specify the quality of answers in just three levels: Bad, Medium and Good.

Manual Judgment of Answer Quality and Relevance To build a machine learning based quality predictor, we need training samples. We randomly selected 894 new Q&A pairs from the Naver collection and manually judged the quality of the answers in the same way. Table 1 shows the test and the training samples have similar statistics.

Feature Extraction

Feature Analysis Surprisingly, ”Questioner's Self Evaluation" is not the feature that has the strongest correlation with the quality of the answer. This means the questioner's self evaluation is often subjective With the exception of the answer length, most of the important features are related to the expertise or the quality of the answerer. This result implies that knowing about the answerer is very important in estimating the quality of answers

Feature Conversion using KDE We propose using kernel density estimation (KDE) KDE is a nonparametric density estimation technique that overcomes the shortcomings of histograms. In KDE, neighboring data points are averaged to estimate the probability density of a given point. We use the Gaussian kernel to give more influence to closer data points. The probability of having a good answer given only the answer length, P (good | AL), can be calculated from the density distributions.

Feature Conversion using KDE where AL denotes the answer length and F() is the density function estimated using KDE. P(good) is the prior probability of having a good quality answer estimated from the training data using the maximum likelihood estimator. P(bad) is measured in the same way.

Feature Conversion using KDE

Good answers are usually longer than bad answers but very long and bad quality answers also exist We use P (good | AL) as our feature value instead of using the answer length directly

Maximum Entropy for Answer Quality Estimation We assume that there is a random process that observes a Q&A pair and generates a label y, an element of a finite set Y ={good ; bad}. Our goal is making a stochastic model that is close to the random process. We construct a training dataset by observing the behavior of the random process. The training dataset is (x1; y1); (x2; y2); … (xN; yN). xi is a question and answer pair and yi is a label that represents the quality of the answer.

Maximum Entropy for Answer Quality Estimation To avoid confusion with the document features, we refer to the feature functions as predicates. Each predicate corresponds to each document feature that we explained in the previous section.

Maximum Entropy for Answer Quality Estimation where is a empirical probability distribution that can be easily calculated from the training data.

Finding Optimal Models In many cases, there are infinite number of models that satisfy the constraints explained in the previous subsection. In the maximum entropy approach, we choose the model that has maximum conditional entropy

Finding Optimal Models We use Zhang Le's maximum entropy toolkit for the experiment Each predicate has a corresponding parameter and the following is the final equation

RETRIEVAL EXPERIMENTS As a baseline experiment, we retrieve Q&A pairs using the query likelihood retrieval model The 125 queries are used and the question part of the Q&A pair is searched to find relevant Q&A pairs to the query, because the question part is known to be much more useful than the answer part in finding relevant Q&A pairs We incorporate the quality measure into the baseline system and compare retrieval performance

Retrieval Framework In our approach, P(D) = p (y |x = D) To avoid zero probabilities and estimate more accurate document language models, documents are smoothed using a background collection

Retrieval Framework P ml (w|C) is the probability that the term w is generated from the collection C. P ml (w|C) is estimated using the maximum likelihood estimator. is the smoothing parameter.

Evaluation Method We made three different relevance judgment files. The first one (Rel_1) considers only the relevance between the query and the question The second file (Rel_2) considers both the relevance and the quality of Q&A pairs. If the quality of the answer is judged as `bad', then the Q&A pair is removed from the relevance judgment file The last judgment file (Rel_3) requires a stronger requirement of quality. If the quality of the answer is judged `bad' or `medium', then the Q&A pair is removed from the file

Experimental Results

Surprisingly, the retrieval performance is significantly improved even when we use the relevance judgment file that does not consider quality. This implies bad quality Q&A pairs tend not to be relevant to any query and incorporating the quality measure pulls down these useless Q&A pairs to lower ranks and improves the retrieval results overall.

Experimental Results Because Rel_2 has smaller number of relevant Q&A pairs and Rel_3 contains even smaller number of the pairs, the retrieval performance is lower. However, the performance drop becomes much less dramatic when we integrate the quality measure

CONCLUSION AND FUTUREWORK We showed how we could systematically and statistically process non-textual features to improve search quality and achieved significant improvement in retrieval performance Therefore, we believe our approach can be applied to other web services We plan to improve the feature selection mechanism and develop a framework that can handle both textual and non-textual feature together and apply it to other web services