Download presentation
Presentation is loading. Please wait.
Published byRoderick Chad Lane Modified over 9 years ago
1
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel
2
Content Problem Past algorithms Contribution in this paper Approach –Differences Results, Observation and Conclusion
3
Relevance Searching Interested in only one or few relevant and novel data items/links User may not care if some the links are not that useful Precision, the fraction of the top-k which is actually in the true topk
4
Content Problem Past algorithms Contribution in this paper Approach –Differences Results, Observation and Conclusion
5
Algorithms we have learned … Fagin’s TA algorithm TA-Random –Problem with TA-Random, random accesses are expensive TA-Sorted –Problem with TA-sorted, sorted indices may not be always available
6
Content Problem Past algorithms Contribution in this paper Approach –Differences Results, Observation and Conclusion
7
Contribution Probabilistic threshold test p(d) Looking at the current seen part of the score, “What is the probability that the tuple can be in final top-k?”
9
Content Problem Past algorithms Contribution in this paper Approach –Differences Results, Observation and Conclusion
10
Approach Probabilistic score prediction –Uniform distribution –Histograms –Poisson Distributions Approximation technique which is computationally cheaper than histograms
11
Histogram Probability Buckets and Value Ranges ∑ Probability = 1 0150
12
Algorithms Conservative Algorithm Aggressive Algorithm Progressive Algorithm Smart Algorithm
13
Conservative Algorithm Simply predict the scores of each candidate object in every step Maintains priority queue for each group of unseen part Incur very high overload for probabilistic threshold test
14
Aggressive Algorithm If the score of object falls below the threshold min-k the algorithm stops immediately Minimal overhead but result precision is low
15
Progressive Algorithm Between conservative and aggressive Tracks the best score changes after uniform interval Maintains a single priority Queue
16
Smart Algorithm Rebuilding the entire queue is also a costly operation when the queue is large in case of big datasets Maintains only bounded priority Queue, whenever its rebuilt only best b elements are kept
17
Content Problem Past algorithms Contribution in this paper Approach –Differences Results, Observation and Conclusion
18
Experiment
19
Conclusion Probabilistic score predictions can be very beneficial in terms of execution time for trading for some amount of top-k result quality
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.