Download presentation
Presentation is loading. Please wait.
Published byPhilippa Andrews Modified over 8 years ago
1
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : Krisztian Balog, Leif Azzopardi, Maarten de Rijke Information Processing and Management (IPM) 45 (2009) 1–19
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation The expert finding: finding experts given a topic. Yellow Pages: Profiles: employees self-assess their skills. Keywords; e.g., marketing Problem: Information: antiquated Keywords: restricted 3
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives Within the organization… Mine published intranet documents. Search all kinds of expertise. ‘Who are the experts on topic “Internet marketing and internet advertising” in my organization?’ 4
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Overview To capture the association between a candidate expert and an area of expertise… “What is the probability of a candidate ca being an expert given the query topic q?” Model 1: candidate-based (query-independent) approach: idea: build a profile of candidate experts, and rank them based on query. Model 2: document-based (query-dependent) approach idea: find the query-relevant documents, then associate with experts. 5 (constant) Bayes’ Theorem (uniform)
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Model 1 Build a textual representation (model) of a person’s knowledge according to his documents. Then estimate the probability of the query given the candidate’s model. 6 p(Internet Marketing | θ ca ) =p(“Internet”| θ ca ) ‧ p(“Marketing”| θ ca ) e.g., p(Internet marketing and internet advertising| θ ca ) =p(“Internet”| θ ca ) 2 ‧ p(“Marketing”| θ ca ) ‧ p(“and”| θ ca ) ‧ p(“Advertising”| θ ca ) (Smoothed) (weighted)
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Model 1B Estimate p(t | d, ca) Candidate identifier Window size (w) 7 e.g., p(“Internet”| “Mail.No.43”, “John”) … John (john@gmail.com) is a major in marketing. …john@gmail.com … ( ) is a major in marketing. … p.s. the closer, the more powerful. (weighted)
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Model 2 8 (Smoothed)
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Model 2B Model 2 Model2B 9
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – document-candidate associations Boolean model TF-IDF 10 (document importance) (senior member of organization)
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Evaluation measures: MAP (mean average precision) MRR (mean reciprocal rank): 11 (1/3 + 1/2 + 1)/3 = 11/18
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Model 1 vs. Model 2 Window-based models 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Association methods Parameter sensitivity 13
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Model 1: build a profile of candidate experts, and rank them based on query. Model 2: find the query-relevant documents, then associate with experts. Model 2 was to be preferred over Model 1: Effectiveness: in terms of average precision and reciprocal rank Implement: only requiring a regular document index window-based extensions improved : Effectiveness: especially on top of Model 1 Frequency-based (TF-IDF) document-candidate associations is helpful. 14
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage Integrate ideas Drawback … Application … 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.