Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
TopicTrend By: Jovian Lin Discover Emerging and Novel Research Topics.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
Querying Structured Text in an XML Database By Xuemei Luo.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Presenter: Shanshan Lu 03/04/2010
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Algorithmic Detection of Semantic Similarity WWW 2005.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
Single Document Key phrase Extraction Using Neighborhood Knowledge.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
DeepWalk: Online Learning of Social Representations
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
Text Based Information Retrieval
Next Question Prediction
Multimedia Information Retrieval
Weakly Learning to Match Experts in Online Community
Prepared by: Mahmoud Rafeek Al-Farra
Enriching Taxonomies With Functional Domain Knowledge
Web Information retrieval (Web IR)
Presentation transcript:

Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online Forums

OUTLINE Introduction Algorithms for question detection Algorithms for answer detection Preliminary Graph base propagation method Experiments

INTRODUCTION Online forums contain a huge amount of valuable user generated content. It is highly desirable if the human knowledge in user generated content can be extracted and reused. 40 forums were investigated and found that 90% of them contain question-answer knowledge. This paper focus the problem of extracting question-answer pairs from forums.

INTRODUCTION Mining question-answer pairs from forums has the following applications. Enrich the knowledge base of CQA(community-based Question Answering services). Access to forum content could be improved by querying question-answer pairs extracted from forums. Augment the knowledge base of chatbot.

INTRODUCTION Question answer pairs embedded in forums are largely unstructured. Question detection Question-mark and 5W1H question words, are not adequate for forum data. This paper proposes a sequential patterns based classification method to detect questions.

INTRODUCTION Answer detection multiple questions and answers may be discussed in parallel and are often interweaved together. consider each candidate answer as an isolated document and the question as a query. model the relationship between answers to form a graph. For each candidate answer, we can compute an initial score of being a true answer using a ranking method.

ALGORITHMS FOR QUESTION DETECTION Question mark and 5W1H words, are not adequate. imperative sentences “I am wondering where I can buy cheap and good clothing in beijing." question marks are often omitted. short informal expressions should not be regarded as questions. “really?" To complement this, this paper extract labeled sequential patterns to identify sentences.

ALGORITHMS FOR QUESTION DETECTION labeled sequential patterns (LSPs) “i want to buy an office software and wonder which software company is best.“ → “wonder which...is“ LHS → c, where LHS is a sequence and c is a class label. a sequence is contained in if there exist integers such that the distance between the two adjacent items and in needs to be less than a threshold (we used 5). if the sequence p1.LHS is contained by p2.LHS and p1.c = p2.c

ALGORITHMS FOR QUESTION DETECTION To mine LSPs, need to pre-process each sentence by applying Part-Of-Speech (POS) tagger(MXPOST Toolkit). keeping keywords including 5W1H, modal words, “wonder",“any" etc. “where can I find a job“ → “where can PRP VB DT NN“ The combination of POS tags and keywords allows us to capture representative features for question sentences by mining LSPs. “ → Q”; “ → Q“

ALGORITHMS FOR QUESTION DETECTION sup(p) is the percentage of tuples in database D that contain the LSP p. conf(p) = EX,,, Its support is 66.7% and its confidence is 100%, with support 66.7% and confidence 66.7%. p1 is a better indication of class Q than p2. In our experiments, we empirically set minimum support at 0.5% and minimum confidence at 85%

ALGORITHMS FOR ANSWER DETECTION Input : a forum thread with the questions annotated; Output : a list of ranked candidate answers for each question. For each question we assume its set of candidate answers to be the paragraphs in the following posts of the question. paragraphs are usually good answer segments in forums. the answers to a question usually appear in the posts after the post containing the question.

ALGORITHMS FOR ANSWER DETECTION three IR methods to rank candidate answers Cosine Similarity where f(w,X) denotes the frequency of word x in X Query likelihood language model KL-divergence language model

ALGORITHMS FOR ANSWER DETECTION The candidate answers for a questions are not independent in forums. Graph based propagation method Given a question q, and the set Aq of its candidate answers. Build a weighted directed graph denoted as (V;E) Given two candidate answers a 0 and a g, use KL(a 0 | a g ) to determine whether there will be an edge a 0 → a g. if 1 / (1 + KL(a 0 | a g )) > µ, an edge will be formed from a 0 to a g.

ALGORITHMS FOR ANSWER DETECTION computing weight the distance between a candidate answer and the question, denoted by d(q, a). the number of his replying posts and the number of threads initiated by him.

ALGORITHMS FOR ANSWER DETECTION normalize weight in PageRank algorithm Computing Propagated Scores Propagation without initial score Propagation with initial score

EXPERIMENTS Data 1,212,153 threads from TripAdvisor forum 86,772 threads fromLonelyPlanet forum 25,298 threads from BootsnAll Network From the TripAdvisor data, we randomly sampled 650 threads. Two annotators were asked to tag questions and their answers in each thread.

EXPERIMENTS Q-TUnion a sentence was labeled as a question if it was marked as a question by either annotator; In Q-TInter a sentence was labeled as a question if both annotators marked it as a question.

EXPERIMENTS