Download presentation
Presentation is loading. Please wait.
Published byMyles Curtis Modified over 9 years ago
1
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh
2
OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 2
3
INTRODUCTION Users struggle with expressing their need as short query 3
4
INTRODUCTION Community-based Question Answering(CQA) sites, such as Yahoo! Answers or Baidu Zhidao 4 Title Body 15% of the questions unanswered Answer new questions by past resolved question
5
OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 5
6
A TWO STAGE APPROACH 6 find the most similar past question. decides whether or not to serve the answer
7
STAGE ONE: TOP CANDIDATE SELECTION Vector-space unigram model with TF-IDF weight 7 Ranking: Cos(Qpast title+body, Qnew title+body) => the top candidate past question and A w1 w2 w3... wn (title) Qnew Qpast 1 Qpast 2. Qpast n 0.1 0.2 0.12... 0.8 0.3 0.5 0.2... 0.1 0.2 0 0.1... 0.6 0.9 0.3 0.5... 0.1 TF-IDF Cosine similarity => threshold α
8
Train a classifier that validates whether A can be served as an answer to Qnew. STAGE TWO: TOP CANDIDATE VALIDATION 8
9
SURFACE-LEVEL FEATURE Surface level statistics text length, number of question marks, stop word count, maximal IDF within all terms in the text, minimal IDF, average IDF, IDF standard deviation, http link count, number of figures. Surface level similarity TF-IDF weighted word unigram vector space model Cosine similarity Qnew title - Qpast title Qnew body - Qpast body Qnew title+ body - Qpast title+body Qnew title+ body - Answer Qpast title+ body - Answer 9
10
LINGUISTIC ANALYSIS Latent topic LDA(Latent Dirichlet Allocation) 10 Qnew Qpast A Topic 1 0.3 0.1 0.25 Topic 2 0.03 0.1 0.02 Topic 3 0.15 0.08 0.12.... Topic n 0.06 0.13 0.05 Entropy Most probable topic JS divergence
11
Lexico-syntactic analysis Stanford dependency parser Main verb, subject, object, the main noun and adjective Ex: Q1:Why doesn’t my dog eat? Main predicate : eat Main predicate argument: dog Q2:Why doesn’t my cat eat? Main predicate : eat Main predicate argument: cat 11
12
RESULT LIST ANALYSIS Query clarity 12 Qnew Qpast1 Qpast2 Qpast3 Qpast all ABCDABCD 0.5 0 0.3 0.2 0 0.5 0.1 0.4 0.1 0 0.9 0.5 0 0.3 0.2 Language model & KL divergence
13
Query feedback Informational similarity between two queries can be effectively estimated by the similarity between their ranked document lists. Result list length The number of questions that pass the threshold α 13
14
CLASSIFIER MODEL Random forest classifier Random n feature & training n past questions … …. 14
15
OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 15
16
OFFLINE Dataset Yahoo! Answer: Beauty & Style, Health and Pets. Included best answers chosen by the askers, and received at least three stars. Between Feb and Dec 2010 16
17
MTurk Fleiss’s kappa 17
18
18
19
19
20
ONLINE 20
21
21
22
OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusions 22
23
CONCLUSIONS Short questions might suffer from vocabulary mismatch problems and sparsity. The long cumbersome descriptions introduce many irrelevant aspects which can hardly be separated from the essential question details(even for a human reader). Terms that are repeated in the past question and in its best answer should usually be emphasized more as related to the expressed need. 23
24
A general informative answer can satisfy a number of topically connected but different questions. A general social answer, may often satisfy a certain type of questions. In future work, we would like to better understand time-sensitive questions, such as common in the Sports category 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.