Advanced Search Features Dr. Susan Gauch. Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator.

Slides:



Advertisements
Similar presentations
ELibrary Topic Search Basics eLibrary topic search allows users to locate articles and multimedia resources –Relevant to K-12 curricula and user.
Advertisements

Information Retrieval in Practice
1 Chap 14 Ranking Algorithm 指導教授 : 黃三益 博士 學生 : 吳金山 鄭菲菲.
Introduction to Information Retrieval
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 2/14/05CS120 The Information Era Searching the Web Don’t we already know how to do this?
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Lecture 2: Boolean Retrieval Model.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Advanced Query Processing Dr. Susan Gauch. Query Term Weights  The vector space model matches queries to documents with the inner product/cosine similarity.
An obvious way to implement the Boolean search is through the inverted file. We store a list for each keyword in the vocabulary, and in each list put the.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Database Software File Management Systems Database Management Systems.
CS/Info 430: Information Retrieval
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 7: Scores in a Complete Search.
Evaluating the Performance of IR Sytems
Adversarial Search and Game Playing Examples. Game Tree MAX’s play  MIN’s play  Terminal state (win for MAX)  Here, symmetries have been used to reduce.
The University of Kansas Vitalseek Dr. Susan Gauch.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Saehoon Kim§, Yuxiong He. , Seung-won Hwang§, Sameh Elnikety
29 June 2005 EECS Department University of Kansas Improving Query Retrieval Times in the Temporal Search Engine By Ryan Sheahan Committee Chair: Dr. Susan.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Indexing and Complexity. Agenda Inverted indexes Computational complexity.
Detecting Near-Duplicates for Web Crawling Manku, Jain, Sarma
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
The Binary Heap. Binary Heap Looks similar to a binary search tree BUT all the values stored in the subtree rooted at a node are greater than or equal.
Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.
LECTURE 36: DICTIONARY CSC 212 – Data Structures.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Web Search Algorithms By Matt Richard and Kyle Krueger.
SEARCH OPTIMIZER By JAGANI RAJ 7 th /I.T. Guided By: Mrs. Darshana H. Patel.
Department of Information Technology e-Michigan Web Development.
Storage and Retrieval Structures by Ron Peterson.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Incremental Indexing Dr. Susan Gauch. Indexing  Current indexing algorithms are essentially batch processing  They start from scratch every time  What.
Web- and Multimedia-based Information Systems Lecture 2.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
1 Information Retrieval LECTURE 1 : Introduction.
Sorting Algorithm Analysis. Sorting  Sorting is important!  Things that would be much more difficult without sorting: –finding a phone number in the.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.
Accumulator Representations Dr. Susan Gauch. Criteria  Fast look up by docid  Need to be able to add posting data efficiently  Acc.Add (docid, wt)
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
1 Using the Lucene Search Engine. 2 Team Phil Corcoran Project Leader 10 Years Software Telecoms, Finance, Manufacturing Reqs, Design, Test Derek O’ Keeffe.
Why indexing? For efficient searching of a document
Information Retrieval in Practice
An Efficient Algorithm for Incremental Update of Concept space
Indexing Goals: Store large files Support multiple search keys
Information Retrieval in Practice
Information Retrieval and Web Search
Processing Data in External Storage
Implementation Issues & IR Systems
Multimedia Information Retrieval
The Anatomy of a Large-Scale Hypertextual Web Search Engine
CS122B: Projects in Databases and Web Applications Winter 2017
Searching EIT, Author Gay Robertson, 2017.
Implementation Based on Inverted Files
6. Implementation of Vector-Space Retrieval
Inverted Indexing for Text Retrieval
Chapter 11 Instructor: Xin Zhang
Adversarial Search and Game Playing Examples
INF 141: Information Retrieval
Presentation transcript:

Advanced Search Features Dr. Susan Gauch

Pruning Search Results  If a query term has many postings  It is inefficient to add all postings to the accumulator and then sort the results  Just reading all postings from the inverted file is not scalable when a word may be in a billion documents  So, process highest weighted postings for a given query term  How many to use?  Several thousand so that we have the chance of adding weights from multiple query terms for a given document

Pruning Search Results  Implementation  Must sort all postings for a given term by weight during indexing  Since all postings for a given term have same idf  Sort postings by rtf during indexing  Can also affect incremental indexing  Kept P postings (max) for any given term  Sorted in order by rtf  If only processing p postings per term (max) at query time, only keep P = p*4 in inverted file  Run experiments on P  How many postings do you need to process to get unchanged top results

Pruning Search Results  Incremental Indexing  Puts a bound on possible growth of postings file  Only ever storing P postings for a given term  Makes adding to the postings slower  Must insert new posting in right location in list of postings for the term by weight  Have a max of P postings per term  Can pre-allocate P posting records per term  Never have to move postings around

Bounded Accumulator  If you create a bounded size accumulator  Want it to store the highest weighted results  Can achieve best results by adding highest postings to accumulator first  Then make minor adjustments by adding lower weight postings  This is achieved by processing query terms with highest idf first

Wildcards  Usually not implemented in web search engines  Wildcards at the end:  Nation*  Matches nation, nations, nationality, nationalization, …  Requires:  Sorted dictionary (inefficient; could use B+ Tree instead of hashtable)  Stemming:  Map words to stems during indexing  Store stems in dict file