LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:

Slides:



Advertisements
Similar presentations
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Personalized Search Result Diversification via Structured Learning
K nearest neighbor and Rocchio algorithm
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
The identification of interesting web sites Presented by Xiaoshu Cai.
Jeopardy Unit 2 – Changes in My World Embedded Assessment 1 Vocabulary Review.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Advanced English Writing
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Automatic Selection of Social Media Responses to News Date : 2013/10/02 Author : Tadej Stajner, Bart Thomee, Ana-Maria Popescu, Marco Pennacchiotti and.
Chapter 6: Information Retrieval and Web Search
RANKING SUPPORT FOR KEYWORD SEARCH ON STRUCTURED DATA USING RELEVANCE MODEL Date: 2012/06/04 Source: Veli Bicer(CIKM’11) Speaker: Er-gang Liu Advisor:
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Chapter 23: Probabilistic Language Models April 13, 2004.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A Ralph Grishman NYU.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Mass Media English I Dr. Ruba Asbahi. Copyright 2008 PresentationFx.com | Redistribution Prohibited | Image © 2008 clix/sxc.hu | This text section may.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Concept-based Short Text Classification and Ranking
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Series of Paragraphs Expressing an Opinion OSSLT Prep.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction Anna Shtok and Oren Kurland and David Carmel SIGIR 2010 Hao-Chin.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Summarizing answers in non-factoid community Question-answering
From frequency to meaning: vector space models of semantics
Michal Rosen-Zvi University of California, Irvine
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Text Categorization Berlin Chen 2003 Reference:
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
CS590I: Information Retrieval
Presentation transcript:

LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source: WWW ’12 Advisor: Dr. Jia-Ling Koh Speaker: Yi-Hsuan Yeh

OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 2

INTRODUCTION Users struggle with expressing their need as short query 3

INTRODUCTION Community-based Question Answering(CQA) sites, such as Yahoo! Answers or Baidu Zhidao 4 Title Body 15% of the questions unanswered Answer new questions by past resolved question

OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 5

A TWO STAGE APPROACH 6 find the most similar past question. decides whether or not to serve the answer

STAGE ONE: TOP CANDIDATE SELECTION Vector-space unigram model with TF-IDF weight 7 Ranking: Cos(Qpast title+body, Qnew title+body) => the top candidate past question and A w1 w2 w3... wn (title) Qnew Qpast 1 Qpast 2. Qpast n TF-IDF Cosine similarity => threshold α

Train a classifier that validates whether A can be served as an answer to Qnew. STAGE TWO: TOP CANDIDATE VALIDATION 8

SURFACE-LEVEL FEATURE Surface level statistics text length, number of question marks, stop word count, maximal IDF within all terms in the text, minimal IDF, average IDF, IDF standard deviation, http link count, number of figures. Surface level similarity TF-IDF weighted word unigram vector space model Cosine similarity  Qnew title - Qpast title  Qnew body - Qpast body  Qnew title+ body - Qpast title+body  Qnew title+ body - Answer  Qpast title+ body - Answer 9

LINGUISTIC ANALYSIS Latent topic LDA(Latent Dirichlet Allocation) 10 Qnew Qpast A Topic Topic Topic Topic n Entropy Most probable topic JS divergence

Lexico-syntactic analysis Stanford dependency parser  Main verb, subject, object, the main noun and adjective Ex: Q1:Why doesn’t my dog eat? Main predicate : eat Main predicate argument: dog Q2:Why doesn’t my cat eat? Main predicate : eat Main predicate argument: cat 11

RESULT LIST ANALYSIS Query clarity 12 Qnew Qpast1 Qpast2 Qpast3 Qpast all ABCDABCD Language model & KL divergence

Query feedback Informational similarity between two queries can be effectively estimated by the similarity between their ranked document lists. Result list length The number of questions that pass the threshold α 13

CLASSIFIER MODEL Random forest classifier Random n feature & training n past questions … …. 14

OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusion 15

OFFLINE Dataset Yahoo! Answer: Beauty & Style, Health and Pets. Included best answers chosen by the askers, and received at least three stars. Between Feb and Dec

MTurk Fleiss’s kappa 17

18

19

ONLINE 20

21

OUTLINE Introduction Description of approach Stage one: top candidate selection Stage two: top candidate validation Experiment Offline Online Conclusions 22

CONCLUSIONS Short questions might suffer from vocabulary mismatch problems and sparsity. The long cumbersome descriptions introduce many irrelevant aspects which can hardly be separated from the essential question details(even for a human reader). Terms that are repeated in the past question and in its best answer should usually be emphasized more as related to the expressed need. 23

A general informative answer can satisfy a number of topically connected but different questions. A general social answer, may often satisfy a certain type of questions. In future work, we would like to better understand time-sensitive questions, such as common in the Sports category 24