INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Slides:



Advertisements
Similar presentations
Hinrich Schütze and Christina Lioma Lecture 4: Index Construction
Advertisements

Lecture 4: Dictionaries and tolerant retrieval
CpSc 881: Information Retrieval. 2 Dictionaries The dictionary is the data structure for storing the term vocabulary. Term vocabulary: the data Dictionary:
CS276A Information Retrieval
Introduction to Information Retrieval Introduction to Information Retrieval Adapted from Christopher Manning and Prabhakar Raghavan Tolerant Retrieval.
An Introduction to IR Lecture 3 Dictionaries and Tolerant retrieval 1.
1 ITCS 6265 Lecture 3 Dictionaries and Tolerant retrieval.
Advanced topics in Computer Science Jiaheng Lu Department of Computer Science Renmin University of China
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
CES 514 Data Mining Feb 17, 2010 Lecture 3: The dictionary and tolerant retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 3: Dictionaries and tolerant retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 3: Dictionaries and tolerant retrieval.
Index Compression Lecture 4. Recap: lecture 3 Stemming, tokenization etc. Faster postings merges Phrase queries.
1 INF 2914 Information Retrieval and Web Search Lecture 9: Query Processing These slides are adapted from Stanford’s class CS276 / LING 286 Information.
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 5 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
PrasadL05TolerantIR1 Tolerant IR Adapted from Lectures by Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning (Stanford)
Introduction to Information Retrieval Introduction to Information Retrieval COMP4210: Information Retrieval and Search Engines Lecture 3: Dictionaries.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval Related to Chapter 3:
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 3: tolerant retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 3: Dictionaries and tolerant retrieval Related to Chapter 3:
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Correcting user queries to retrieve “right” answers Two.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Correcting user queries to retrieve “right” answers Two.
Dictionaries and Tolerant retrieval
Introduction to Information Retrieval Introduction to Information Retrieval Modified from Stanford CS276 slides Chap. 3: Dictionaries and tolerant retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 4: Skip Pointers, Dictionaries and tolerant retrieval.
Information Retrieval On the use of the Inverted Lists.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
An Introduction to IR Lecture 3 Dictionaries and Tolerant retrieval 1.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Lectures 5: Dictionaries and tolerant retrieval
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.)
Information Retrieval Christopher Manning and Prabhakar Raghavan
Dictionary data structures for the Inverted Index
Modified from Stanford CS276 slides
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture 3: Dictionaries and tolerant retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
CSCE 561 Information Retrieval System Models
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Dictionary data structures for the Inverted Index
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval Systems
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture 3: Dictionaries and tolerant retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Web Search and Advanced Internet Services
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 18 Wild Card Queries B Tree

ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, ‎  “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

Outline How to Handle Wild-Card Queries Wild-card queries: * B-Tree

How to Handle Wild-Card Queries B-Trees Permuterm Index K-Grams Soundex Algorithms 00:03:50  00:06:35

Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:06:50  00:07:15(mon*) 00:08:15  00:08:50(easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?

B-Tree 00:10:20  00:11:35

B-Tree n = # of pairs. # of external nodes = n + 1. C D I H G K J F E Level 0 Level 1 Level 2 Level 3 00:16:50  00:17:10 n = # of pairs. # of external nodes = n + 1.

Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:22:50  00:24:00(Easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?

B+ Tree 00:24:10  00:24:50 00:25:10  00:26:10

Wild-card queries example Query Sanat* AND Jayasur* mo*y *day Queries: X lookup on X$ X* lookup on $X* *X lookup on X$* *X* lookup on X* X*Y lookup on Y$X* X*Y*Z ??? Exercise! Sanat AND Jayasur …yad* 00:31:27  00:32:00 00:32:20  00:32:45 00:42:00  00:42:20 00:38:00  00:38:30 00:35:00  00:36:00 00:48:34  00:50:00

Resources Peter Norvig: How to write a spelling corrector IIR 3, MG 4.2 Efficient spell retrieval: K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992. J. Zobel and P. Dart.  Finding approximate matches in large lexicons.  Software - practice and experience 25(3), March 1995. http://citeseer.ist.psu.edu/zobel95finding.html Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at Sweden’s Royal Institute of Technology. http://citeseer.ist.psu.edu/179155.html Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector http://norvig.com/spell-correct.html