11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)

Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Finding High-Quality Content in Social Media chenwq 2011/11/26.

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

ACM Multimedia th Annual Conference, October , 2004

Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.

Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.

Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.

SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.

Learning to Rank for Information Retrieval

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Active Learning for Class Imbalance Problem

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Probabilistic Question Recommendation for Question Answering Communities Mingcheng Qu, Guang Qiu, Xiaofei He, Cheng Zhang, Hao Wu, Jiajun Bu, Chun Chen.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Question Routing in Community Question Answering: Putting Category in Its Place 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 AT&T Labs.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.

Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Hongbo Deng, Michael R. Lyu and Irwin King

Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Unsupervised Streaming Feature Selection in Social Media

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Next Question Prediction

Simulation and Analysis of Question Routing in Social Networks

WorkShop on Community Question Answering on the Web

Using Uneven Margins SVM and Perceptron for IE

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Category-Sensitive Question Routing in Community Question Answering

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

Presentation transcript:

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese University of Hong Kong 2 AT&T Labs Research Workshop on Community Question Answering on the Web in Conjunction with World Wide Web 2012 April 17, 2012

22 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

33 Community-based Question Answering Knowledge dissemination, information seeking Natural language questions Explicit, self-contained answers

44 How CQA Works Submit Question Get Answers? Answer Selection, Question Resolved yes no Question Not Resolved CQA users The number of posted questions grows fast. Whether users could get questions resolved within a reasonable period?

55 Whether Questions Get Resolved Randomly sample 140 questions from each category in Yahoo! Answers 26 top-level categories In total 3,640 questions Track the status of each question

%19.95%24.75%26.48%27.31%51.32%61.92%63.41%64.45% Percentage of Questions Resolved

77 CQA users How CQA Works Submit Question Get Answers? Answer Selection, Question Resolved yes no Question Not Resolved How about we carefully select a set of CQA users who may be interested in the question?

88 Question Routing Definition –Routing open questions to suitable answerers who may be interested in the question Not interested in the question Interested in the question No Yes

99 Question Routing Benefits –Asker’s Perspective Reduce time lag between the time a question is posted and it is answered –Answerer’s Perspective More enthusiastic in providing answers for interested questions –CQA’s perspective Leverage users’ answering passion, leading to the improvement of the CQA, as well as the boosts of the user’s adhesiveness and loyalty to the system

10 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

11 Problem Definition Question Routing Problem Given a question and a user in CQA, determine whether the user will contribute his/her knowledge to answer the question

12 Feature Investigation Local Features –Only local information about question, user history and question-user relationships are needed Global Features –Take into account the global information of CQA –Consider category as the global information –Questions in the same category discuss similar topics –Incorporating global information act as the smoothing effect

13 Feature Investigation # of featuresQuestionUser HistoryQuestion- User Relationship Local Features 3107 Global Features 321 Feature Investigation Summary

14 Local Features Question (3 features) –Question Length Agichtein et al found question length an important feature to measure question quality 1.Title length 2.Detail length –Question Type 3.5W1H type –Why, what, where, who and how

15 Local Features User History (10 features) –Users’ history would have implications for users’ interests and behaviors –Profile, question and answering behaviors 1.Member since 2.Percentage of best answer 3.Total points 4.Number of answers 5.Number of best answers 6.Number of asked questions 7.Number of resolved questions

16 Local Features User History (10 features) 8.Number of stars received 9.Answer/question ratio 10.Best answer/question ratio

17 Local Features Question-User Relationship (7 features) –Capture the relationship between a question and a user –Features adapted from the existing CQA service 1.Top contributor –Features that measure the extent the user is interested in the category given question belongs to 2.Ratio of answered question in the category 3.Ratio of best answered question in the category 4.Ratio of asked question in the category 5.Ratio of starred question in the category

18 Local Features Question-User Relationship (7 features) –Features describing the similarity of the question’s language model and the user’s language model 6.KL-divergence between given question and a user’s answered questions 7.KL-divergence between given question and a user’s background language model (answered, asked, and starred questions)

19 Global Features Question (3 features) –Category-level features that smooth each question 1.Average title length 2.Average detail length –Whether the question is representative in the category 3.KL-divergence value between given question and questions in the category given question belongs to

20 Global Features User History (2 features) –Capture the uniqueness of a user Question-User Relationship (1 feature) –The more similar the language model of a user’s answered questions and that of the questions in a category, the more probable a user would answer the questions from the category KL-divergence between the user’s answered questions and questions in the category given question belongs to

21 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

22 Experiments Classification Algorithm –Support vector machines (SVM) with linear kernel Metrics –Precision, recall, F1 for positive class –Accuracy for both classes Dataset –Crawled from 3,500 users’ “Answers”, “Questions”, and “Starred Questions” pages from Yahoo! Answers

23 Effect of Local Features PrecisionRecallF1Accuracy Question User History Question-User Relationship Question-User Relationship achieves the best F1 and recall Capture the user’s performance and interests in the category of the given question Capture the semantic relatedness of the given question and the user User History achieves the best precision Some users are quite active in the system These highly active users only account for a few percentage among all users

24 Effect of Local Features PrecisionRecallF1Accuracy Q + QU Relationship U + QU Relationship Q + U + QU Relationship Top 10 features in Local features The combination of all local features achieves the best F1 Results of employing the top 10 features are also encouraging

25 Effect of Local Features Two most important local features –KL-divergence value between given question and questions answered by the user Capture the most accurate semantic relatedness between the given question and the knowledge of the user –KL-divergence value between given question and questions answered, asked, and starred by the user Consider the user’s interests as well by incorporating other factors

26 Effect of Local and Global Features PrecisionRecallF1Accuracy Local Global Local + Global Combination of local features and global features promise to maintain the best elements of the two, and the best F1 score is consequently achieved

27 Effect of Local and Global Features Three most important features –KL-divergence value between given question and questions answered by the user –KL-divergence value between given question and questions answered, asked, and starred by the user –KL-divergence value between given question and questions from the same category If a question is quite typical in the category, it would have higher chance to be answered by users, and this could also partially explain the reason why CQA services usually have well-structured categories

28 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

29 Related Work Question Routing –Zhou et al. 2009, expertise-based question routing –Li and King 2010, language model based framework for combining expertise estimation and availability estimation –Li et al. 2011, category-sensitive language model Link analysis and Expert Finding –Jurczyk and Agichtein, 2007 –Zhang, Ackerman and Adamic, 2007 –Apply PageRank and HITS in social media

30 Introduction Problem Definition and Feature Experiments Conclusions and Future Work Related Work

31 Conclusions Formulate question routing as a classification task Derive a variety of local and global features Analyze the contributions from different sources Thorough experimental study

32 Future Work Semi-supervised approach Incorporate social aspects into the model

33 Thanks Q&A

34 FAQ How to prepare positive and negative instances? –If a user answered a question, we considered the question-user pair as positive instance –If a user asked a question, we considered the question-user pair as negative instance –Assumption If a user asked a question, it might mean that he/she did not possess the knowledge about the question –More realistic negative instance? Present an open question to a user, but the user does not answer the question (Data only available at CQA owners)

35 FAQ How to know the importance of each feature? –We employ SVM with linear kernel. We could rank features’ importance by sorting the absolute weight values of the SVM model, and the weight value of the j-th feature could be calculated as follows: –n is the number of training samples, α i is the support vector, y i is the label, and x i j is the value of j-th feature of observation x i

36 FAQ How to calculate KL-divergence? –Kullback-Leibler divergence of two distributions P and Q are calculated as follows: –P(i) and Q(i) are estimated based on Maximum Likelihood Estimation (MLE)