Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Site Level Noise Removal for Search Engines André Luiz da Costa Carvalho Federal University of Amazonas, Brazil Paul-Alexandru Chirita L3S and University.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Introduction to Information Retrieval (Part 2) By Evren Ermis.

Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Evaluating Search Engine

1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.

Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Ensemble Learning: An Introduction

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Presented by Zeehasham Rasheed

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Maximum likelihood (ML)

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.

Online Learning for Collaborative Filtering

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Question Answering over Implicitly Structured Web Content

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.

Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.

Queensland University of Technology

Tingdan Luo 05/02/2016 Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Tingdan Luo

Learning to Rank with Ties

Presentation transcript:

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008

Introduction  Question Answering (QA)  Form of information retrieval where the users’ information need is specified in the form of a natural language question The desired result is a self-contained answer, not a list of documents  Community Question Answering (CQA)  Communities organized around QA, such as Yahoo! Answers and Naver  Archive millions of questions and hundreds of millions of answers  More effective alternative to web search, since CQA connects users to others who are willing to share information directly Users receive direct responses and thus do not have to browse results of search engines to locate their answers 2

Challenges  Searching for existing answers is crucial to avoid duplication, and save time and efforts to users  However, existing search engines are not designed for answering queries that require deep semantic understanding Example. Consider the query “When is the hurricane season in the Caribbean?”. Using Yahoo search users still need to click into web pages to find information 3

Challenges Example (cont.). Yahoo! Answers provides one brief, high-quality answer 4

Challenges  A large portion of CQA content reflects personal, unsubstantiated opinion of users, which are not useful for factual information  To retrieve correct factual answers to a question it is necessary to determine the relevance and quality of candidate answers Explicit feedback from users, in the form of “best answer” or “thumps up/down” rating, can provide a strong indicator of the quality of an answer However, how to integrate explicit user feedback and relevance into a single ranking, it is still a concern 5 Ranking framework that takes advantage of user interaction info. to retrieve high quality, relevant content in social media Proposed Solution

Learning Ranking Functions  Problem definition of QA retrieval  Given a user query Q, the ordering of a set of QA pairs according to their relevance to Q is done by learning a ranking function for triples of the form (qr k, qst i, ans i j ) where qr k is the k-query in a set of queries, qst i is the i th question in a CQA system, and ans i j is the j th answer to qst i 6

User Interactions in CQA  Yahoo! Answers s upports effective search of archived questions and answers, and allows its users to  Ask questions (“Asker”)  Answer questions (“Answerer”)  Evaluate the system (“Evaluator”), by voting for answers of other users, marking interesting questions, and reporting abusive behavior 7

 Each query-question-answer triple is represented by  Textual features, i.e., textual similarity between query, question, and answers Features  Statistical features, i.e., independent features for query, question, and answers 8

 Social features, i.e., user interaction activities & community- based features, that can approximate the users’ expertise in the QA community 9 Features

10 Preference Data Extraction  “Users evaluation data” are extracted as a set of preference data which can be used for ranking answers  For each query qr under the same question qst, consider two existing answers ans 1 and ans 2 Assume ans 1 has p 1 plus votes and m 1 minus votes out of n 1 impressions, whereas ans 2 has p 2 plus votes and m 2 minus votes out of n 2 impression To determine whether ans 1 is preferred over ans 2, in terms of their relevance to qst, it is assumed that plus votes obey a binomial distribution

11 Binomial Distribution  A binomial experiment (i.e., Bernoulli trial) is a statistical experiment that has the following properties:  The experiment consists of N repeated trials  Each trial can result in just two possible outcomes, i.e., success or failure  The probability of success, denoted by p, is the same on every trial. The probability of failure is 1 – p  The trials are independent, i.e., the outcome on one trial does not affect the outcome on other trials  In a binomial experiment that (i) consists of N trials, (ii) results in x successes, and (iii) the probability of success on an individual trial is p, the binomial probability is “Binomial coefficient” which is read as “x out of N”

12 Binomial Distribution  Example. On a 10-question multiple choice test, with 4 options per question, the probability of getting 5 answers correct if the answers are guessed can be calculated as B(5; 10, 25%) = c(10, 5)(0.25) 5 (0.75) 5 ≈ 5.8%  where p = 0.25, 1 - p = 0.75, x = 5, N = 10  Thus, if somebody guesses 10 answers on a multiple choice test with 4 options, they have about a 5.8% chance of getting 5 correct answers

13 Preference Data Extraction  To determine whether ans 1 and ans 2 are significant, i.e., there are enough votes to compare the pair, the likelihood ratio test is applied If λ > threshold, then the pair is significant  To determine the preference for the pair, ans 1 and ans 2, if then ans 1 is preferred over ans 2, denoted ans 1 ans 2 ; o.w., ans 2 is preferred over ans 1, denoted ans 2 ans 1 Positive constant Binomial Distribution p1p1 p 1 + m 1 + s p2p2 p 2 + m 2 + s >

14 Preference Data Extraction  For two query-question-answer items with the same query, i.e., (qr, qst 1, ans 1 ) and (qr, qst 2, ans 2 ), let their feature vectors be X and Y If ans 1 has a higher labeled grade than ans 2, then the preference X Y is included If, on the other hand, ans 2 has a higher labeled grade than ans 1, then the preference Y X is included  Suppose the set of available preferences is where S, x, y denote the feature vector for two query- question-answer triples with the same query, and x y means that x is preferred over y, i.e., x should be ranked higher than y

15 Learning Ranking from Preference Data  The problem of learning ranking functions is cast as the problem of computing the ranking function h that matches the set of preferences, i.e.,  i = 1..N h(x i ) ≥ h(y i ), if x i y i   (h) is the objective function (squared hinge loss function) that measures the risk of a given ranking function h, such that x i y i is a contradicting pair w.r.t. h if h( x i ) < h( y i ) where is a function class, chosen to be linear combinations of regression trees  The minimization (min) problem is solved by using functional gradient descent, an algorithm based on gradient boosting

16 Learning Ranking from Preference Data  Learning ranking function h using gradient boosting (GBRank)  An algorithm that optimizes a cost function over function space by iteratively choosing a (ranking) function   & the number of iterations are determined by cross validation decision tree

17 Experimental Setup  Datasets  1,250 Factoid questions from the TREC QA benchmarks data  QA collection dataset: Submit each query Q to the Yahoo! Answers & extracts up to 10 top-ranked related questions Retrieve as many answers to Q as available Total number of tuples: 89,642 with 17,711 relevant & 71,931 non-relevant ones  Evaluation Metrics: MRR, and MAP  Ranking Methods Compared  Baseline_Yahoo (ordered by posting date)  Baseline_Votes (ordered by Positive_Votes – Negative_Votes)  GBRanking (ordered by proposed community/social features)

18 Experimental Results  Ranking Methods Compared  For each TREC query, there is a list of Yahoo! questions (YQ a, YQ b, …) & for each question, there are multiple answers (YQ a 1, YQ a 2, …)

19 Experimental Results  Ranking Methods  MRR_MAX: Calculate the MRR value of each Yahoo! Answers question & choose the highest MRR value as the TREC query’s MRR Simulates an “intelligent” user who always selects the most relevant retrieved Yahoo! question first  MRR_STRICT: Same as MRR_MAX, but choose their average MRR values as the TREC query’s MRR Simulates a user who blindly follow the Yahoo! Answer’s ranking & their corresponding ordered answers  MRR_RR (Round Robin): Use YQ a ’s 1 st answer as the TREC query’s 1 st answer, YQ b ’s 1 st answer as the TREC query’s 2 nd answer, and so on Simulates a “jumpy” user who believes in first answers

20 Experimental Results  Ranking Methods Compared  MAX performs better than the other two metrics for baseline  GBrank is even better than MAX & achieves a gain of 18% relative to MAX (i.e., Baseline_Yahoo) (i.e., Baseline_Votes)

21 Experimental Results  Learning Ranking Function  Using 10-fold cross validation on 400(/1250) TREC queries

22 Experimental Results  Robustness to Noisy Labels  Use 50 manually-labeled queries & randomly select 350 TREC queries with related questions & answers  Results show that a nearly-optimal model is generated even when trained on noisy relevance labels

23 Experimental Results  Study on Feature Set  when learning ranking function with removing each category, respectively  Results show that a nearly-optimal model is generated even when trained on noisy relevance labels

24 Experimental Results  Study on Feature Set  Users’ evaluations play a very important role in learning ranking function