Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.

Slides:



Advertisements
Similar presentations
Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Advertisements

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Computer Science and.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Multimedia Answer Generation for Community Question Answering.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Finding High-Quality Content in Social Media chenwq 2011/11/26.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Predicting Content Change On The Web BY : HITESH SONPURE GUIDED BY : PROF. M. WANJARI.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Question Identification on Twitter Baichuan Li, Xiance Si, Michael R. Lyu, Irwin King, and Edward Y. Chang 10/9/20151.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Question Answering over Implicitly Structured Web Content
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Finding Question-Answer Pairs from Online Forums ACM, SIGIR 08 Gao Cong Aalborg University, Aalborg, Denmark Long Wang Tianjin University, Tianjin, China.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Queensland University of Technology
Introduction to Data Mining
Next Question Prediction
Web Mining Department of Computer Science and Engg.
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Actively Learning Ontology Matching via User Interaction
Presentation transcript:

Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009

Outline Introduction Related Work Problem Definition Classification Methods Experiments Conclusion

Introduction Online users share ideas, discuss issues and form communities within discussion boards(online forums) Knowledge discovery and information extraction Several potential applications about mining QA content: Search engines Online QA services Experts in social media Knowledge base of automatic chat-bots

Related Work Cong et al., 2008 They developed a classification-based method for question detection sequential pattern features extracted from both questions and non-questions in forums Preprocess by applying a POS tagger while keeping 5W1H and modal words Time-consuming problem Focus on question sentences or question paragraphs

Related Work(cont’d) Knowledge acquisition from discussion boards Zhou and Hovy, 2005 Feng et al., 2006 Using non-textual features like click count to predict the quality of answers Jeon et al., 2006 In general all related work does not need to detect questions

Tasks Tasks: Identifying question-related first posts Fining potential answers in subsequent responses within the corresponding threads Some questions…

Tasks(cont’d) Some questions: Can we detect question-related threads in an efficient and effective manner? What other features can be used to improve the performance? How much can the combinations of some simple heuristics improve performance? Are traditional relevance-based approaches suitable to these QA content?

Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

Problem Definition Questions Focus on finding whether the first post is a question post Treat the whole post as a question post:

Problem Definition(cont’d) Answers If one of the replied posts contains answers to the questions proposed in the first post, then regard that reply as an answer post Also consider replied post not containing the actual content of answers but providing links to other potential answers an answer posts. Result from the system: Question-answer post pairs

Classification Methods(1/3) NTU CSIE LIBSVM 2.88 Question detection: Question mark 5W1H words Total number of posts within one thread Authorship N-gram

Classification Methods(2/3) Answer detection The position of the answer post Authorship N-gram Stop words Query likelihood model score

Classification Methods(3/3) Cong et al., 2008 Sequential pattern mining Graph-based model Query likelihood language model KL-divergence language model

Experiments(1/9) Data crawled 555,954 threads from Ubuntu dataset 721,422 threads from Photography On The Net Question detection task: Randomly sampled 572 threads from Ubuntu dataset and 500 threads from the DC dataset Answer detection task: Randomly sampled 500 question-related threads from both dataset

Experiments(2/9)

Experiments(3/9)

Experiments(4/9)

Experiments(5/9)

Experiments(6/9)

Experiments(7/9)

Experiments(8/9) Propose a ranking scheme Ranking score: V1: position + authorship, V2: position, V3: authorship

Experiments(9/9)

Conclusion Use of N-grams and the combination of several non- content features can improve the performance Relevance-based retrieval methods would not be effective in tackling the problem but the performance can be improved by combining with non-content features Design a simple ranking scheme that outperforms previous approaches

Combine several potential answers together to make a better answer ? A good understanding of the interaction of question answering in the discussion boards

Thank You !