Download presentation
Presentation is loading. Please wait.
Published byRandolf Haynes Modified over 6 years ago
1
Modeling Score Distributions for Combining the Outputs of Search Engines
Reading Notes Wang Ning Lab of Database and Information Systems Dec 3rd, 2003
2
Revision History Nov. 30th, 2003: Draft
Dec. 1st, 2003: Add all pictures Dec. 2nd, 2003: Add references
3
Literature Information
Title Modeling Score Distributions for Combining the Outputs of Search Engines Author R. T. F. Institution Center for Intelligent Information Retrieval University of Massachusetts Conference Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
4
Basic Idea Meta Search: Difficulties Previous Work The Authors’ Idea
Combining results from search engines Difficulties No architecture and algorithm information No score information Previous Work Linear combination of document ranks COMMIN, COMMAX, COMSUM, COMMNZ The Authors’ Idea Model the score distributions
5
Test Data TREC: Text REtrieval Conference Search Engines
TREC 3, TREC 4 TREC 6 for Chinese Documents Search Engines INQUERY (Probabilistic Model) CITY (Probabilistic Model) SMART (Vector Space Model) Bellcore (LSI Engine)
6
Model Assumptions The sets of non-relevant documents can be modeled with exponential distribution The sets of relevant documents can be modeled with Gaussian distribution Explanations and argumentations comes later
7
Non-relevant Documents: Exponential Distribution
8
Relevant Documents: Gaussian Distribution
9
Likelihood Function
10
MLE: Maximum Likelihood Estimate
11
Basic Idea of MLE God always let the event with the biggest probability happen firstly -- The MLE of Θ is to make the sample occur the most likely.
12
Limitations of Gaussian Fit
Well: sufficient relevant documents (>=60) Bad: fewer relevant documents (usually) Why? Model Fault Lack of samples (the authors’ point) Solutions Maybe Bayesian analysis works here
13
Mixture Model Fit
14
Mixture Model Fit (cont.)
15
EM: Expectation Maximization
Important parameter estimation method
16
EM Steps
17
Mixture Model Fit: INQUERY
18
Mixture Model Fit: SMART
19
Posterior Probabilities
20
Posterior Probabilities: SMART
21
Limitations of Posterior Probabilities
22
Problem I: Mixture Model
Model Selection: Exponential and Gaussian? Fit the data well Can be recovered with EM algorithm EM Algorithm: Limitations and Solutions Local maxima Solutions: Arbitrary initial condition Fit the exponential distribution first, and remove those documents that do not fit well to fit the Gaussian
23
Problem II: Shapes of Distributions
24
Shapes of Poisson's
25
Applications Combining Outputs of Search Engines
Using posterior probabilities Automatic Engine Selection Distinction: larger distance between mean and intersect point of two distributions Relevance: higher maximum of posterior probabilities
26
Comparative Study: Combining
27
Comparative Study: Selecting
28
What Can I Learn from this Paper?
Scientific Methodology Clear and simple models Theoretical reasoning & experimental support Natural and simple mathematical methods Standard test data and comparative study
29
Alternative Method Bayes Optimal Metasearch: A Probabilistic Model for Combining the Results of Multiple Retrieval Systems J. A. Aslam & M. Montague Dartmouth College SIGIR’01
30
Probabilistic Model
31
Comparisons manmatha01modeling aslam01Bayes Pros Cons
Clear and simple models Cons Strong model assumptions Some inherent limitations of EM algorithm aslam01Bayes Training prior probabilities Naive Bayes independent assumptions
32
My Thoughts Training of prior probabilities to obtain more accurate outputs models The small sample space limits the use of traditional statistics. Maybe we can use Bayes analysis to avoid it.
33
References R. Manmatha and T. Rath and Fangfang Feng. Modeling Score Distributions for Combining the Outputs of Search Engines. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, J. A. Aslam and M. Montague. Bayes optimal metasearch: A probabilistic model for combining the results of multiple retrieval systems. In the Proc. of the 23rd ACM SIGIR conf. on Research and Developement in Information Retrieval, pages , 2000. Jiangsheng, Yu. Expectation Maximization: An Approach to Parameter Estimation. Lecture of Machine Learning Seminar, 2003
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.