CS122B: Projects in Databases and Web Applications Winter 2017

Slides:



Advertisements
Similar presentations
Advanced topics in Computer Science Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 1: Boolean Retrieval 1.
CS276A Text Retrieval and Mining Lecture 1. Query Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? One could grep all.
Adapted from Information Retrieval and Web Search
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Lecture 2: Boolean Retrieval Model.
Boolean Retrieval Lecture 2: Boolean Retrieval Web Search and Mining.
An obvious way to implement the Boolean search is through the inverted file. We store a list for each keyword in the vocabulary, and in each list put the.
CS276 Information Retrieval and Web Search Lecture 1: Boolean retrieval.
PrasadL3InvertedIndex1 Inverted Index Construction Adapted from Lectures by Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning (Stanford)
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
PrasadL3InvertedIndex1 Inverted Index Construction Adapted from Lectures by Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning (Stanford)
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 1 Boolean retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval cs458 Introduction David Kauchak adapted from:
LIS618 lecture 2 the Boolean model Thomas Krichel
Modern Information Retrieval Lecture 3: Boolean Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 1: Introduction and Boolean retrieval.
IR Paolo Ferragina Dipartimento di Informatica Università di Pisa.
ITCS 6265 IR & Web Mining ITCS 6265/8265: Advanced Topics in KDD --- Information Retrieval and Web Mining Lecture 1 Boolean retrieval UNC Charlotte, Fall.
Text Retrieval and Text Databases Based on Christopher and Raghavan’s slides.
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Information Retrieval and Web Search
1 CS276 Information Retrieval and Web Search Lecture 1: Introduction.
Information Retrieval Lecture 1. Query Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? Could grep all of Shakespeare’s.
1. L01: Corpuses, Terms and Search Basic terminology The need for unstructured text search Boolean Retrieval Model Algorithms for compressing data Algorithms.
1 Information Retrieval LECTURE 1 : Introduction.
Introduction to Information Retrieval Introduction to Information Retrieval cs160 Introduction David Kauchak adapted from:
Introduction to Information Retrieval Boolean Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 1: Boolean retrieval.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
Module 2: Boolean retrieval. Introduction to Information Retrieval Information Retrieval  Information Retrieval (IR) is finding material (usually documents)
CS315 Introduction to Information Retrieval Boolean Search 1.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Why indexing? For efficient searching of a document
Take-away Administrativa
Information Retrieval : Intro
Search in Google's N-grams
Large Scale Search: Inverted Index, etc.
CS122B: Projects in Databases and Web Applications Winter 2017
COIS 442 Foundations on IR Information Retrieval and Web Search
Slides from Book: Christopher D
정보 검색 특론 Information Retrieval and Web Search
Information Retrieval and Web Search
CS 430: Information Discovery
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Multimedia Information Retrieval
Boolean Retrieval.
Basic Information Retrieval
Boolean Retrieval.
CS122B: Projects in Databases and Web Applications Spring 2018
Information Retrieval and Web Search Lecture 1: Boolean retrieval
CS122B: Projects in Databases and Web Applications Winter 2018
Lectures 4: Skip Pointers, Phrase Queries, Positional Indexing
Boolean Retrieval.
Introduction to Information Retrieval
CS122B: Projects in Databases and Web Applications Winter 2019
CS276 Information Retrieval and Web Search
Query processing: phrase queries and positional indexes
Introducing Information Retrieval and Web Search
CS122B: Projects in Databases and Web Applications Spring 2018
CS122B: Projects in Databases and Web Applications Winter 2018
INF 141: Information Retrieval
Introduction to Search Engines
CS122B: Projects in Databases and Web Applications Winter 2019
CS122B: Projects in Databases and Web Applications Winter 2019
CS122B: Projects in Databases and Web Applications Winter 2018
Inverted Index Construction
CS122B: Projects in Databases and Web Applications Winter 2019
CS122B: Projects in Databases and Web Applications Spring 2018
CS122B: Projects in Databases and Web Applications Winter 2018
Presentation transcript:

CS122B: Projects in Databases and Web Applications Winter 2017 Professor Chen Li Department of Computer Science UC Irvine Notes 13: Inverted Index Slides borrowed from Prof. Manning at Stanford

Query Which plays of documents contain the words Cat AND Dog but NOT Fish?

Inverted index For each term T, we must store a list of all documents that contain T. Do we use an array or a list for this? Cat 2 4 8 16 32 64 128 Dog 1 2 3 5 8 13 21 34 Fish 13 16

Inverted index Linked lists generally preferred to arrays Dynamic space allocation Insertion of terms into documents easy Space overhead of pointers 2 4 8 16 32 64 128 Dictionary Cat Dog Fish 1 2 3 5 8 13 21 34 13 16 Postings

Query processing Consider processing the query: Cat AND Dog Locate Cat in the Dictionary; Retrieve its postings. Locate Dog in the Dictionary; “Merge” the two postings: 2 4 8 16 32 64 128 Cat 1 2 3 5 8 13 21 34 Dog

The merge Walk through the two postings simultaneously, in time linear in the total number of postings entries 2 34 128 2 4 8 16 32 64 1 3 5 13 21 4 8 16 32 64 128 Cat Dog 2 8 1 2 3 5 8 13 21 34 If the list lengths are x and y, the merge takes O(x+y) operations. Crucial: postings sorted by docID.

Boolean queries: Exact match Boolean Queries are queries using AND, OR and NOT together with query terms Views each document as a set of words Is precise: document matches condition or not. Primary commercial retrieval tool for 3 decades. Professional searchers (e.g., lawyers) still like Boolean queries: You know exactly what you’re getting.

Other Challenges Stemming Tokenization Stop words Synonyms Especially hard for non-Latin languages E.g., Chinese, Japanese Stop words Synonyms