INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID Lecture # 18 Wild Card Queries B Tree
ACKNOWLEDGEMENTS The presentation of this lecture has been taken from the underline sources “Introduction to information retrieval” by Prabhakar Raghavan, Christopher D. Manning, and Hinrich Schütze “Managing gigabytes” by Ian H. Witten, Alistair Moffat, Timothy C. Bell “Modern information retrieval” by Baeza-Yates Ricardo, “Web Information Retrieval” by Stefano Ceri, Alessandro Bozzon, Marco Brambilla
Outline How to Handle Wild-Card Queries Wild-card queries: * B-Tree
How to Handle Wild-Card Queries B-Trees Permuterm Index K-Grams Soundex Algorithms 00:03:50 00:06:35
Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:06:50 00:07:15(mon*) 00:08:15 00:08:50(easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?
B-Tree 00:10:20 00:11:35
B-Tree n = # of pairs. # of external nodes = n + 1. C D I H G K J F E Level 0 Level 1 Level 2 Level 3 00:16:50 00:17:10 n = # of pairs. # of external nodes = n + 1.
Wild-card queries: * mon*: find all docs containing any word beginning with “mon”. Easy with binary tree (or B-tree) lexicon: retrieve all words in range: mon ≤ w < moo *mon: find words ending in “mon”: harder Maintain an additional B-tree for terms backwards. Can retrieve all words in range: nom ≤ w < non. 00:22:50 00:24:00(Easy) Exercise: from this, how can we enumerate all terms meeting the wild-card query pro*cent ?
B+ Tree 00:24:10 00:24:50 00:25:10 00:26:10
Wild-card queries example Query Sanat* AND Jayasur* mo*y *day Queries: X lookup on X$ X* lookup on $X* *X lookup on X$* *X* lookup on X* X*Y lookup on Y$X* X*Y*Z ??? Exercise! Sanat AND Jayasur …yad* 00:31:27 00:32:00 00:32:20 00:32:45 00:42:00 00:42:20 00:38:00 00:38:30 00:35:00 00:36:00 00:48:34 00:50:00
Resources Peter Norvig: How to write a spelling corrector IIR 3, MG 4.2 Efficient spell retrieval: K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), Dec 1992. J. Zobel and P. Dart. Finding approximate matches in large lexicons. Software - practice and experience 25(3), March 1995. http://citeseer.ist.psu.edu/zobel95finding.html Mikael Tillenius: Efficient Generation and Ranking of Spelling Error Corrections. Master’s thesis at Sweden’s Royal Institute of Technology. http://citeseer.ist.psu.edu/179155.html Nice, easy reading on spell correction: Peter Norvig: How to write a spelling corrector http://norvig.com/spell-correct.html