Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008

Introduction  Question Answering (QA)  Form of information retrieval where the users’ information need is specified in the form of a natural language question The desired result is a self-contained answer, not a list of documents  Community Question Answering (CQA)  Communities organized around QA, such as Yahoo! Answers and Naver  Archive millions of questions and hundreds of millions of answers  More effective alternative to web search, since CQA connects users to others who are willing to share information directly Users receive direct responses and thus do not have to browse results of search engines to locate their answers 2

Challenges  Searching for existing answers is crucial to avoid duplication, and save time and efforts to users  However, existing search engines are not designed for answering queries that require deep semantic understanding Example. Consider the query “When is the hurricane season in the Caribbean?”. Using Yahoo search users still need to click into web pages to find information 3

Challenges Example (cont.). Yahoo! Answers provides one brief, high-quality answer 4

Challenges  A large portion of CQA content reflects personal, unsubstantiated opinion of users, which are not useful for factual information  To retrieve correct factual answers to a question it is necessary to determine the relevance and quality of candidate answers Explicit feedback from users, in the form of “best answer” or “thumps up/down” rating, can provide a strong indicator of the quality of an answer However, how to integrate explicit user feedback and relevance into a single ranking, it is still a concern 5 Ranking framework that takes advantage of user interaction info. to retrieve high quality, relevant content in social media Proposed Solution

Learning Ranking Functions  Problem definition of QA retrieval  Given a user query Q, the ordering of a set of QA pairs according to their relevance to Q is done by learning a ranking function for triples of the form (qr k, qst i, ans i j ) where qr k is the k-query in a set of queries, qst i is the i th question in a CQA system, and ans i j is the j th answer to qst i 6

User Interactions in CQA  Yahoo! Answers s upports effective search of archived questions and answers, and allows its users to  Ask questions (“Asker”)  Answer questions (“Answerer”)  Evaluate the system (“Evaluator”), by voting for answers of other users, marking interesting questions, and reporting abusive behavior 7

 Each query-question-answer triple is represented by  Textual features, i.e., textual similarity between query, question, and answers Features  Statistical features, i.e., independent features for query, question, and answers 8

 Social features, i.e., user interaction activities & community- based features, that can approximate the users’ expertise in the QA community 9 Features

10 Preference Data Extraction  “Users evaluation data” are extracted as a set of preference data which can be used for ranking answers  For each query qr under the same question qst, consider two existing answers ans 1 and ans 2 Assume ans 1 has p 1 plus votes and m 1 minus votes out of n 1 impressions, whereas ans 2 has p 2 plus votes and m 2 minus votes out of n 2 impression To determine whether ans 1 is preferred over ans 2, in terms of their relevance to qst, it is assumed that plus votes obey a binomial distribution

11 Binomial Distribution  A binomial experiment (i.e., Bernoulli trial) is a statistical experiment that has the following properties:  The experiment consists of N repeated trials  Each trial can result in just two possible outcomes, i.e., success or failure  The probability of success, denoted by p, is the same on every trial. The probability of failure is 1 – p  The trials are independent, i.e., the outcome on one trial does not affect the outcome on other trials  In a binomial experiment that (i) consists of N trials, (ii) results in x successes, and (iii) the probability of success on an individual trial is p, the binomial probability is “Binomial coefficient” which is read as “x out of N”

12 Binomial Distribution  Example. On a 10-question multiple choice test, with 4 options per question, the probability of getting 5 answers correct if the answers are guessed can be calculated as B(5; 10, 25%) = c(10, 5)(0.25) 5 (0.75) 5 ≈ 5.8%  where p = 0.25, 1 - p = 0.75, x = 5, N = 10  Thus, if somebody guesses 10 answers on a multiple choice test with 4 options, they have about a 5.8% chance of getting 5 correct answers

13 Preference Data Extraction  To determine whether ans 1 and ans 2 are significant, i.e., there are enough votes to compare the pair, the likelihood ratio test is applied If λ > threshold, then the pair is significant  To determine the preference for the pair, ans 1 and ans 2, if then ans 1 is preferred over ans 2, denoted ans 1 ans 2 ; o.w., ans 2 is preferred over ans 1, denoted ans 2 ans 1 Positive constant Binomial Distribution p1p1 p 1 + m 1 + s p2p2 p 2 + m 2 + s >

14 Preference Data Extraction  For two query-question-answer items with the same query, i.e., (qr, qst 1, ans 1 ) and (qr, qst 2, ans 2 ), let their feature vectors be X and Y If ans 1 has a higher labeled grade than ans 2, then the preference X Y is included If, on the other hand, ans 2 has a higher labeled grade than ans 1, then the preference Y X is included  Suppose the set of available preferences is where S, x, y denote the feature vector for two query- question-answer triples with the same query, and x y means that x is preferred over y, i.e., x should be ranked higher than y

15 Learning Ranking from Preference Data  The problem of learning ranking functions is cast as the problem of computing the ranking function h that matches the set of preferences, i.e.,  i = 1..N h(x i ) ≥ h(y i ), if x i y i   (h) is the objective function (squared hinge loss function) that measures the risk of a given ranking function h, such that x i y i is a contradicting pair w.r.t. h if h( x i ) < h( y i ) where is a function class, chosen to be linear combinations of regression trees  The minimization (min) problem is solved by using functional gradient descent, an algorithm based on gradient boosting

16 Learning Ranking from Preference Data  Learning ranking function h using gradient boosting (GBRank)  An algorithm that optimizes a cost function over function space by iteratively choosing a (ranking) function   & the number of iterations are determined by cross validation decision tree

17 Experimental Setup  Datasets  1,250 Factoid questions from the TREC QA benchmarks data  QA collection dataset: Submit each query Q to the Yahoo! Answers & extracts up to 10 top-ranked related questions Retrieve as many answers to Q as available Total number of tuples: 89,642 with 17,711 relevant & 71,931 non-relevant ones  Evaluation Metrics: MRR, P@K, and MAP  Ranking Methods Compared  Baseline_Yahoo (ordered by posting date)  Baseline_Votes (ordered by Positive_Votes – Negative_Votes)  GBRanking (ordered by proposed community/social features)

18 Experimental Results  Ranking Methods Compared  For each TREC query, there is a list of Yahoo! questions (YQ a, YQ b, …) & for each question, there are multiple answers (YQ a 1, YQ a 2, …)

19 Experimental Results  Ranking Methods  MRR_MAX: Calculate the MRR value of each Yahoo! Answers question & choose the highest MRR value as the TREC query’s MRR Simulates an “intelligent” user who always selects the most relevant retrieved Yahoo! question first  MRR_STRICT: Same as MRR_MAX, but choose their average MRR values as the TREC query’s MRR Simulates a user who blindly follow the Yahoo! Answer’s ranking & their corresponding ordered answers  MRR_RR (Round Robin): Use YQ a ’s 1 st answer as the TREC query’s 1 st answer, YQ b ’s 1 st answer as the TREC query’s 2 nd answer, and so on Simulates a “jumpy” user who believes in first answers

20 Experimental Results  Ranking Methods Compared  MAX performs better than the other two metrics for baseline  GBrank is even better than MAX & achieves a gain of 18% relative to MAX (i.e., Baseline_Yahoo) (i.e., Baseline_Votes)

21 Experimental Results  Learning Ranking Function  Using 10-fold cross validation on 400(/1250) TREC queries

22 Experimental Results  Robustness to Noisy Labels  Use 50 manually-labeled queries & randomly select 350 TREC queries with related questions & answers  Results show that a nearly-optimal model is generated even when trained on noisy relevance labels

23 Experimental Results  Study on Feature Set  P@K when learning ranking function with removing each category, respectively  Results show that a nearly-optimal model is generated even when trained on noisy relevance labels

24 Experimental Results  Study on Feature Set  Users’ evaluations play a very important role in learning ranking function

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Similar presentations

Presentation on theme: "Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Similar presentations

Presentation on theme: "Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008."— Presentation transcript:

Similar presentations

About project

Feedback