INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Slides:



Advertisements
Similar presentations
Multimedia Database Systems
Advertisements

INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 1: Boolean Retrieval 1.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Lecture 2: Boolean Retrieval Model.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
CpSc 881: Information Retrieval
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Hinrich Schütze and Christina Lioma
PrasadL3InvertedIndex1 Inverted Index Construction Adapted from Lectures by Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning (Stanford)
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Scoring, Term Weighting, and Vector Space Model Lecture 7: Scoring, Term Weighting and the Vector Space Model Web Search and Mining 1.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 1 Boolean retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval cs458 Introduction David Kauchak adapted from:
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
LIS618 lecture 2 the Boolean model Thomas Krichel
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Information Retrieval Introduction to Information Retrieval CS276 Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
Introduction to Information Retrieval Introduction to Information Retrieval COMP4210: Information Retrieval and Search Engines Lecture 5: Scoring, Term.
Search Tools and Search Engines Searching for Information and common found internet file types.
1. L01: Corpuses, Terms and Search Basic terminology The need for unstructured text search Boolean Retrieval Model Algorithms for compressing data Algorithms.
Lecture 6: Scoring, Term Weighting and the Vector Space Model
Introduction to Information Retrieval Introduction to Information Retrieval cs160 Introduction David Kauchak adapted from:
Introduction to Information Retrieval Boolean Retrieval.
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
Information Retrieval and Web Search IR models: Vector Space Model Instructor: Rada Mihalcea [Note: Some slides in this set were adapted from an IR course.
Web Information Retrieval Textbook by Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Notes Revised by X. Meng for SEU May 2014.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 9: Scoring, Term Weighting and the Vector Space Model.
CS315 Introduction to Information Retrieval Boolean Search 1.
Web-based Information Architecture 01: Boolean Retrieval Hongfei Yan School of EECS, Peking University 2/27/2013.
INFO 320: Information Needs, Searching, and Presentation (aka… Search)
Take-away Administrativa
Text Indexing and Search
Ch 6 Term Weighting and Vector Space Model
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Boolean Retrieval.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
CS122B: Projects in Databases and Web Applications Winter 2017
CS 430: Information Discovery
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval Systems
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Boolean Retrieval.
Introduction to Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval and Web Design
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
CS276: Information Retrieval and Web Search
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 3 Boolean Retrieval Model Rank Retrieval Model 00:00:28  00:01:04

ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the following sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, ‎  “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

Outline Boolean Retrieval Model Information Retrieval Ingredients Westlaw Ranked retrieval models

Boolean Retrieval Model

Boolean Retrieval Model D1 = {This is a pen} D2 = {It is a pen} Set (D1, D2) = {This, It, is, a, pen} Set = {a, b, c} = {b, a, c} bag = {a, a, b, c} 00:04:50  00:07:00

Boolean queries The Boolean retrieval model can answer any query that is a Boolean expression. Boolean queries are queries that use AND, OR and NOT to join query terms. Views each document as a set of terms. Is precise: Document matches condition or not. Primary commercial retrieval tool for 3 decades Many professional searchers (e.g., lawyers) still like Boolean queries. You know exactly what you are getting. Many search systems you use are also Boolean: spotlight, email, intranet etc. 00:07:02  00:07:35 00:07:45  00:10:00

Information Retrieval Ingredients

Information Retrieval Ingredients Documents representation Query formulation Query processing 00:16:05  00:17:20

Westlaw

Commercially successful Boolean retrieval: Westlaw Largest commercial legal search service in terms of the number of paying subscribers Over half a million subscribers performing millions of searches a day over tens of terabytes of text data The service was started in 1975. In 2005, Boolean search (called “Terms and Connectors” by Westlaw) was still the default, and used by a large percentage of users . . . . . . although ranked retrieval has been available since 1992. 00:19:40  00:20:00 00:22:15  00:23:48 00:23:52  00:23:48

Westlaw: Example queries Information need: Information on the legal theories involved in preventing the disclosure of trade secrets by employees formerly employed by a competing company Query: “trade secret” /s disclos! /s prevent /s employe! Information need: Requirements for disabled people to be able to access a workplace Query: disab! /p access! /s work-site work-place (employment /3 place) Information need: Cases about a host’s responsibility for drunk guests Query: host! /p (responsib! liab!) /p (intoxicat! drunk!) /p guest 00:23:52  00:25:35 00:26:12  00:28:00 00:29:13  00:30:00 00:30:15  00:30:30 00:31:20  00:31:41

Problem with Boolean search: feast or famine Requires query writing skills Boolean queries often result in either too few (=0) or too many (1000s) results. It takes a lot of skill to come up with a query that produces a manageable number of hits. AND gives too few; OR gives too many 00:35:50  00:36:50

Ranked retrieval models

Ranked retrieval models Rather than a set of documents satisfying a query expression, in ranked retrieval, the system returns an ordering over the (top) documents in the collection for a query Free text queries: Rather than a query language of operators and expressions, the user’s query is just one or more words in a human language In principle, there are two separate choices here, but in practice, ranked retrieval has normally been associated with free text queries and vice versa 00:37:45  00:38:30 00:38:55  00:39:20

Feast or famine: not a problem in ranked retrieval When a system produces a ranked result set, large result sets are not an issue Indeed, the size of the result set is not an issue We just show the top k ( ≈ 10) results We don’t overwhelm the user Premise: the ranking algorithm works 00:42:00  00:42:25

Scoring as the basis of ranked retrieval We wish to return in order the documents most likely to be useful to the searcher How can we rank-order the documents in the collection with respect to a query? Assign a score – say in [0, 1] – to each document This score measures how well document and query “match”. 00:42:40  00:44:20

Resources Chapter 1 of IIR Resources at http://ifnlp.org/ir Boolean Retrieval