Weakly Learning to Match Experts in Online Community

Slides:



Advertisements
Similar presentations
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Advertisements

1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
Querying Structured Text in an XML Database By Xuemei Luo.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Data Mining for Expertise: Using Scopus to Create Lists of Experts for U.S. Department of Education Discretionary Grant Programs Good afternoon, my name.
Advanced Higher Computing Science
Information Retrieval in Practice
Rationalizing Neural Predictions
Queensland University of Technology
Recommendation in Scholarly Big Data
Greedy & Heuristic algorithms in Influence Maximization
BY DR. M. MASOOM RAZA  AND ABDUS SAMIM
Who is the Expert? Combining Intention and Knowledge of Online Discussants in Collaborative RE Tasks Itzel Morales-Ramirez1,2, Matthieu Vergne1,2, Mirko.
An Empirical Study of Learning to Rank for Entity Search
DM-Group Meeting Liangzhe Chen, Nov
How to use By Zainab Muman
Big-Data Fundamentals
Feasibility reports.
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
CIKM Competition 2014 Second Place Solution
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Distributed Representation of Words, Sentences and Paragraphs
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
Result of Ontology Alignment with RiMOM at OAEI’06
Lesson 5. Lesson 5 Extraneous variables Extraneous variable (EV) is a general term for any variable, other than the IV, that might affect the results.
CIKM Competition 2014 Second Place Solution
A Markov Random Field Model for Term Dependencies
Improve Phase Wrap Up and Action Items
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
How to Get Your Paper Rejected
Disambiguation Algorithm for People Search on the Web
iSRD Spam Review Detection with Imbalanced Data Distributions
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Easy Chair Online Conference Submission, Tracking and Distribution Process: Getting Started + Information for Reviewers AMS World Marketing Congress /
Michal Rosen-Zvi University of California, Irvine
Socialized Word Embeddings
Example: Academic Search
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Presentation and project
Actively Learning Ontology Matching via User Interaction
Jonathan Elsas LTI Student Research Symposium Sept. 14, 2007
Learning to Rank with Ties
WORQ WORKSHOP How to Write International Quality Publications
Jointly Generating Captions to Aid Visual Question Answering
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
Topic: Semantic Text Mining
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Weakly Learning to Match Experts in Online Community Yujie Qian†‡ Jie Tang† Kan Wu† † Tsinghua University ‡ Massachusetts Institute of Technology Hi, I am Yujie Qian. Today I am presenting “Weakly Learning to Match Experts in Online Community”. It is a joint work with Jie Tang and Kan Wu. This work was done while I was studying at Tsinghua University.

Question-and-Answer Question Answers by other users In this paper, we study the problem of matching experts in online community. Let’s first start with an example. In question-and-answer website such as Quora, user can post their questions online, and then other users who are familiar with this topic might provide their answers.

Question-and-Answer Invite users to answer unsolved questions User response: agree to answer In order to keep these QA websites to be helpful and efficient, we would like to see the questions on the website can receive qualified answers within reasonable time. A central task for these website is to find appropriate users for each given question. Most of the QA websites are now actively inviting users to answer the questions. Here shows three questions recommended to me on Quora, and I can choose to either agree or decline to answer. The task of matching experts in this example is to find users who will be able to and will agree to answer a given question. decline to answer

Peer Review Invite experts to review journal/conference submission Paper Information (title, abstract, authors, …) Reviewer 1 decline to review Another example is academic peer review. In academic conferences or journals, the organizers need to invite experts to review the submissions. This figure shows a journal managing website. The journal editor can see the submission’s information, including the title, auhtors, abstract, and main content, and then invite several reviewers to review this paper. However, a serious problem is that the acceptance rate of the review invitation is usually quite low. We can see for this paper in the figure, only one of the invited reviewers agreed to review, while the other three declined or didn't respond. So the editor has to invite some other reviewers. Reviewer 2 agree to review Reviewer 3 decline to review Reviewer 4 no response

How to match the question/paper with the best experts? The best experts should have sufficient knowledge on the topic; be willing to answer/review. The problem we are studying in this work is, how to match the question or paper with the best experts? There are two things need to be considered. The first is that the experts should have sufficient knowledge on the specific topic. It is the focus of most previous research. The second is that the experts should be willing to answer the question or review the paper. We want to emphasize that the latter is the actual goal of our expert matching problem, but is usually neglected in previous work.

Problem: Match Questions to Experts Input: Candidate Experts 𝐸={ 𝑒 1 ,…, 𝑒 𝑁 } Query 𝑞 (question/paper) … Output: Formally, the input of the expert matching problem has two parts, a query q which can be either a question or a paper, and then a set of candidate experts. The output is a ranked expert list where each expert is associated with a ranking score. …… Ranked expert list Rank 1: 𝑒 1 Score : 𝑆 𝑞1 Rank 2: 𝑒 2 Score : 𝑆 𝑞2 Rank 𝑁: 𝑒 𝑁 Score : 𝑆 𝑞𝑁

Formulation Rank score of expert 𝑒 𝑖 in query 𝑞: expertise matching willingness to answer 𝛼 is a trade-off parameter When 𝛼=1, the problem is reduced to traditional expertise matching We define the ranking score to be a trade-off between the expertise matching degree and the willingness to answer, with a controllable parameter alpha. Note that when we set alpha=1, the problem is reduced to traditional expertise matching problem.

Challenges Difficult to predict the expert response. Difficult to collect labeled data. Difficult to evaluate the performance of a potential solution. The challenges of this problem include the following: it is very difficult to predict the expert responses, since there are a lot of factors which might affect the expert to agree or not. And it is usually difficult to collect sufficient labeled data. Moreover, it is also not easy to evaluate the performance of a potential solution, especially in an online fashion.

Motivation Incorporate the correlations between experts Observation: the expert who has a “friend” already declined is more likely to decline as well. In this work, our main idea is to incorporate the correlations between experts in order to better predict the expert response. It is motivated by an observation that the expert who has a “friend” already declined the invitation is more likely to decline as well. The correlations are defined differently in each data, for example we use the coauthorship in finding paper reviewers.

Weakly Supervised Factor Graph (WeakFG) Our Solution: Weakly Supervised Factor Graph (WeakFG) We propose a weakly supervised factor graph to deal with the challenges discussed before. We call it WeakFG for short.

WeakFG Output: Whether each expert will accept / decline Edges: correlations Embeddings Nodes: query-expert pairs In WeakFG, we define a graph, where the nodes are the query-expert pairs, and the edges represent the correlations between experts. Two kinds of factors are defined, where local factor f captures the local attributes of each query-expert pair, and the correlation factor captures the correlation between experts. Local factor 𝑓 𝑞, 𝑒 𝑖 , 𝐯 𝑖 , 𝑦 𝑖 : local attributes of each query-expert pair Correlation factor 𝑔 𝑒 𝑖 , 𝑒 𝑗 , 𝑦 𝑖 , 𝑦 𝑗 : correlations between experts

Expertise matching score: query 𝑞, expert 𝑒 :{ 𝑑 𝑘 } aggregation max / average document similarity Sim(𝑞, 𝑑 𝑘 ) can be implemented in different ways: Language models Topic models Embedding Methods Word Mover’s Distance (WMD) [1] Document to Vector (D2V) [2] We first explain how we calculate the expertise matching score in our work. Each expert can be considered as a set of documents, such as the questions they have answered, or the papers they have published. Then we define the expertise matching score to be the aggregation of the similarities between the query and each document in the expert’s set, using max or average aggregation. The document similarity can be implemented in different ways such as language models and topic models. We also adopt some recent embedding-based methods, such as the word mover’s distance and document to vector, to improve expertise matching performance. [1] Kusner, Matt, et al. "From word embeddings to document distances." International Conference on Machine Learning. 2015. [2] Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International Conference on Machine Learning. 2014.

WeakFG Local factor function: Correlation factor function: 𝜓(∙) : features for each query-expert pair, e.g., expertise matching scores, statistics, … We have defined two classes of factor functions in WeakFG. The first one is local factor function. Expertise matching scores, as well as other statistical features associated with each query-expert pair, are captured by the local factor function. The correlation factor function is defined between the output variables of related experts, to encode their correlation. Here alpha and beta are the model parameters to be learned. 𝜙(∙) : indicator for specific correlation between experts

Model Learning Objective function: Optimization: gradient ascent algorithm Estimate the gradient with Loopy Belief Propagation (LBP) Then we define the maximum likelihood objective by combining the two kinds of factor functions, and optimize it with the gradient ascent algorithm. The gradient estimation is non-trivial in factor graphs. In this work, we choose the Loopy Belief Propagation algorithm to estimate the gradients for each update.

Prediction Find the most likely outputs given the query 𝑞 Candidate generation Use language model (LM) to retrieve candidate experts first, by coarse-level matching In the prediction phase, the goal is to find the most likely outputs given the query. Note that we first use a language model to do a coarse level matching to generate the candidates, and then use the WeakFG to generate the ranking. For more details about training and prediction, please check our paper.

Experiments Datasets QA-Expert * Paper-Reviewer Match questions to users in a QA website 182 questions, 599 users Paper-Reviewer Match conference submissions to PCs 935 submissions, 440 PCs To evaluate our method, we performed both off-line and online evaluation. As for offline evaluation, we conduct experiments on two different datasets. QA-Expert is a dataset from an international data challenge. The task in this dataset is to match questions to users of a QA website. Paper-Reviewer is a dataset constructed from the reviewing data of a conference, and the task is to match submissions to the program committee members. We consider the program committee members’ biddings as the positive responses. * https://biendata.com/competition/bytecup2016/

Results From the results, we can clearly see that WeakFG outperforms traditional expertise matching methods and the baseline RankSVM algorithm. It confirms the necessity of considering the expert’s willingness to answer or review in expert matching. We also validate that it is beneficial to incorporate correlations between experts, so that we we can better utilize the labeled data and thus improve the predictions. WeakFG outperforms traditional expertise matching and baseline ranking algorithm.

Online Evaluation Reviewer Recommender* Supported by AMiner (aminer.cn) Help journal editors to find qualified reviewers Deployed on Chrome Web Store Used by journals such as ACM TKDD, Science China, etc. In order to perform online evaluation, we developed an online reviewer recommendation tool based on the AMiner academic data mining system, and deployed it on the Chrome Web Store. It helps journal editors to find qualified reviewers for a given submission. This tool has already been used by several journals such as ACM Transactions on Knowledge Discovery from Data, and Science China. Our online evaluation results also show that WeakFG improves recommendation quality significantly compared with traditional methods. WeakFG improves recommendation quality! * https://chrome.google.com/webstore/detail/reviewer-recommender/holgegjhfkdkpclackligifbkphemhmg

Yujie Qian yujieq@mit.edu Thank you! Yujie Qian yujieq@mit.edu This is the end of my presentation. Thanks for your listening!