Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID

Similar presentations


Presentation on theme: "INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID"— Presentation transcript:

1 INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture # 18 Wild Card Queries B Tree

2 ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, ‎Alistair Moffat, ‎Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, ‎  “Web Information Retrieval” by Stefano Ceri, ‎Alessandro Bozzon, ‎Marco Brambilla

3 Outline How to Handle Wild-Card Queries Wild-card queries: * B-Tree

4 How to Handle Wild-Card Queries
B-Trees Permuterm Index K-Grams Soundex Algorithms 00:03:50  00:06:35

5 Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:06:50  00:07:15(mon*) 00:08:15  00:08:50(easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?

6 B-Tree 00:10:20  00:11:35

7 B-Tree n = # of pairs. # of external nodes = n + 1.
C D I H G K J F E Level 0 Level 1 Level 2 Level 3 00:16:50  00:17:10 n = # of pairs. # of external nodes = n + 1.

8 Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:22:50  00:24:00(Easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?

9 B+ Tree 00:24:10  00:24:50 00:25:10  00:26:10

10 Wild-card queries example
Query Sanat* AND Jayasur* mo*y *day Queries: X lookup on X$ X* lookup on $X* *X lookup on X$* *X* lookup on X* X*Y lookup on Y$X* X*Y*Z ??? Exercise! Sanat AND Jayasur …yad* 00:31:27  00:32:00 00:32:20  00:32:45 00:42:00  00:42:20 00:38:00  00:38:30 00:35:00  00:36:00 00:48:34  00:50:00

11 Resources Peter Norvig: How to write a spelling corrector
IIR 3, MG 4.2 Efficient spell retrieval: K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992. J. Zobel and P. Dart.  Finding approximate matches in large lexicons.  Software - practice and experience 25(3), March Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at Sweden’s Royal Institute of Technology. Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector


Download ppt "INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID"

Similar presentations


Ads by Google