Maryam Karimzadehgan (U. Illinois Urbana-Champaign)*, Ryen White (MSR), Matthew Richardson (MSR) Presented by Ryen White Microsoft Research * MSR Intern,

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
A Quality Focused Crawler for Health Information Tim Tang.
Search Engines and Information Retrieval
PROBLEM BEING ATTEMPTED Privacy -Enhancing Personalized Web Search Based on:  User's Existing Private Data Browsing History s Recent Documents 
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Computing Trust in Social Networks
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
EFFECTS OF COMMUNITY SIZE AND CONTACT RATE ON SYNCHRONOUS SOCIAL Q&A Ryen W. White Microsoft Research Matthew Richardson Microsoft Research Yandong Liu.
3 Chapter Needs Assessment.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Chapter 5: Information Retrieval and Web Search
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
© 2008 IBM Corporation ® Atlas for Lotus Connections Unlock the power of your social network! Customer Overview Presentation An IBM Software Services for.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Social Networking Algorithms related sections to read in Networked Life: 2.1,
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Management of Community Nutrition Services Chapter 19.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Chapter 6: Information Retrieval and Web Search
CSM06 Information Retrieval Lecture 6: Visualising the Results Set Dr Andrew Salway
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Post-Ranking query suggestion by diversifying search Chao Wang.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
SSE3 Knowledge mangement concepts 1. Agenda What is knowledge management Classification of knowledge Knowledge management process Common/shared information.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
An Empirical Study of Learning to Rank for Entity Search
Weakly Learning to Match Experts in Online Community
INF 141: Information Retrieval
Precision and Recall Reminder:
Presentation transcript:

Maryam Karimzadehgan (U. Illinois Urbana-Champaign)*, Ryen White (MSR), Matthew Richardson (MSR) Presented by Ryen White Microsoft Research * MSR Intern, Summer 08

Motivation for expert finding Some questions cannot be answered using a Web search engine Involve tacit / procedural knowledge, internal org topics Some solutions: Social connections (ask people, follow referrals) Time-consuming in large organizations Post to forum or mail distribution list May be unanswered, interrupt many, high latency Find one or more candidate experts and present the question to them Finding these experts is the challenge of expert finding...

Overview Task in expert finding is to find people in an organization with expertise on query topic Profiles typically constructed for each member from sources such as / shared documents What if we don’t have a profile for everyone? Can we use organizational hierarchy to help us find experts without profiles and refine others’ profiles? Propose and evaluate algorithm that considers org. member and the expertise of his or her neighbors

Organizational hierarchy Depicts managerial relationships between organizational members Nodes represent members (people) Links represent reporting and peer relationships Peers are members with the same direct manager Can we use the hierarchy to improve expert finding by sharing expertise around the hierarchy? Reporting relationship Peer relationship

Does proximity  shared expertise? Before we can use neighbors as a proxy for a member’s expertise we must know if their expertise is comparable People who work in the same group may have similar interests and expertise because: They work on the same product Their role is probably similar (dev, test, HR, legal, sales) Neighbors may be good proxies for those with no profile But we should check to be sure…

Does proximity  shared expertise? We conducted a study with Microsoft Corporation MS employs over 150,000 people, inc. temps/vendors By crawling internal distribution lists we created profiles for 24% of employees via their sent mail Demonstrates the challenge (76% had no profile) Selected random question from internal “idunno” list: Subject: Standard clip art catalog or library Body:Do we have a corporate standard collection of clip art to use in presentations, specs, etc.? Found candidates, asked them to rate own expertise

Does proximity  shared expertise? Asked for self-evaluation 0/1/2 = couldn’t answer / some knowledge / could answer ed immediate neighbors same self-evaluation A organizational member’s expertise correlates strongly neighbor expertise (caveat: for this particular question) Neighbors’ expertise may be a good proxy for missing profiles or useful to refine existing profiles Source member ratingMean neighbor ratingN

Expert Modeling Techniques

Baseline Language-modeling approach Build profile based on associated with person Compute probability that this model generates query Text representation of expertise for j th expert Number of times word w occurs in e j Estimated from all expertise docs, E Total number of words in e j Dirichlet prior – set empirically Dirichlet prior – set empirically

Hierarchy-based algorithm Baseline only effective if we have for all members Since this is unlikely, we propose to use org. hierarchy All members scored w/ Baseline (many get zero score) Then, their scores are smoothed with neighbors  weights member versus neighbors Number of neighbors of j Initial scores using Baseline

Smoothing Multi-level One, two, or three Candidate expert = member w/ query-relevant profile

Evaluation

Expert profiling Profiles were constructed for organizational members s sent to internal discussion lists within MS Stemmed text, only used text they wrote (not question) “idunno” list was excluded from this crawl Average number of s per employee = 29 Median number of s per employee = 6 We have outgoing s for only approximately 36,000 employees (there are ~153,000 employees) We have information for only 24% of all employees

Expert-rating data Compare the baseline and hierarchy-based algorithms Expert rating data used as ground truth Devise and distribute survey with 20 randomly-selected questions from internal “idunno” discussion list Examples of questions from the list: Where can I get technical support for MS SQL Server? Who is the MS representative for college recruiting at UT Austin? Survey was distributed to the 1832 member of the discussion list, 189 respondents rated their expertise as 0/1/2 for each of the 20 questions 0/1/2 = couldn’t answer / some knowledge / could answer

Methodology Baseline is sub-part of hierarchy-based algorithm Allowed us to determine the effect of using hierarchy Set Dirichlet prior, , to 100 and the hierarchy smoothing parameter, , to both determined empirically via parameter sweeps Used subjects of 20 selected questions as test queries Expert rating of 2 = relevant, 0/1 = non-relevant Generated a ranked list of employees using each alg. Computed precision-recall and avg. over all queries

Evaluation Results

Precision-recall Ranked all employees for each question Kept only those for whom we had ratings (189 total) Interpolated-averaged 11-point PR curve

Precision-recall - ranking Prior findings could be explained by hierarchy-based algorithm returning more employees We used each algorithm to rank all employees We kept only those for which we had expert ratings, maintaining their relative rank order. We did not ignore rated employees that were not retrieved, but we appended them to the end of the result list in random order Computed precision-recall curves for each algorithm, where each point was averaged across 100 runs

Precision-recall - ranking Interpolated precision at zero for all alg. is approx Hierarchy-based algorithm also better at ranking

Further opportunities We investigated propagating keywords around the hierarchy rather than scores Keyword performance was significantly worse Perhaps because of low keyword quality or a shortage of information about each employee (only a few s each) Weighting edges between organizational members based on their relationship Peer-to-peer  manager-to-subordinate Experiment with other sources Whitepapers, websites, communication patterns

Summary Expertise representation: Use org. hierarchy to address data sparseness challenge when we lack information for all org. members Expertise modeling: Hierarchy-based algorithm to share expertise info. Evaluation: Org. hierarchy and human-evaluated data from Microsoft Outcome: Org. hierarchy improves expert finding – useful on its own or perhaps as a feature in machine learning (future work)