Download presentation
Presentation is loading. Please wait.
Published byPearl Richardson Modified over 9 years ago
1
Advanced Search Features Dr. Susan Gauch
2
Pruning Search Results If a query term has many postings It is inefficient to add all postings to the accumulator and then sort the results Just reading all postings from the inverted file is not scalable when a word may be in a billion documents So, process highest weighted postings for a given query term How many to use? Several thousand so that we have the chance of adding weights from multiple query terms for a given document
3
Pruning Search Results Implementation Must sort all postings for a given term by weight during indexing Since all postings for a given term have same idf Sort postings by rtf during indexing Can also affect incremental indexing Kept P postings (max) for any given term Sorted in order by rtf If only processing p postings per term (max) at query time, only keep P = p*4 in inverted file Run experiments on P How many postings do you need to process to get unchanged top results
4
Pruning Search Results Incremental Indexing Puts a bound on possible growth of postings file Only ever storing P postings for a given term Makes adding to the postings slower Must insert new posting in right location in list of postings for the term by weight Have a max of P postings per term Can pre-allocate P posting records per term Never have to move postings around
5
Bounded Accumulator If you create a bounded size accumulator Want it to store the highest weighted results Can achieve best results by adding highest postings to accumulator first Then make minor adjustments by adding lower weight postings This is achieved by processing query terms with highest idf first
6
Wildcards Usually not implemented in web search engines Wildcards at the end: Nation* Matches nation, nations, nationality, nationalization, … Requires: Sorted dictionary (inefficient; could use B+ Tree instead of hashtable) Stemming: Map words to stems during indexing Store stems in dict file
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.