Download presentation
Presentation is loading. Please wait.
Published byJamison Gadsden Modified over 9 years ago
1
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University
2
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Too many results! Traditional Keyword Search No result! Complicated and still no result!
3
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Interactive Fuzzy Keyword Search Features: Interactive: data exploration Fuzzy: error tolerant Multiple keywords: search on-the- fly
4
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Fundamentals Data R: a set of records W: a set of distinct words Query Q = {p 1, p 2, …, p l }: a set of prefixes δ: Edit-distance threshold Query result R Q : a set of records such that each record has all query prefixes or their similar forms (conjunctive)
5
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Contributions / Outline Step 1 Incremental fuzzy prefix matching Step 2 Multi-prefix intersection methods Cache-based prefix intersection
6
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Observation W = {exam, example, exemplar, exempt, sample} δ = 2 PrefixDistance exam2 examp1 exampl0 example1 exemp2 exempt2 exempl1 exempla2 sampl2 PrefixDistance examp2 exampl1 example0 exempl2 exempla2 sample2 delete e match e delete e substitute e with a match e Q’ = examplQ = example
7
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Trie Indexing Computing set of active nodes Φ Q Initialization Incremental step e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance examp2 exampl1 example0 exempl2 exempla2 sample2 Active nodes for Q = example e 2 1 0 2 2 2
8
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Initialization Q = ε e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance 0 11 22 PrefixDistance 0 e1 ex2 s1 sa2 PrefixDistance ε 0 Initializing Φ ε with all nodes within in depth of δ e
9
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Incremental Computation: Algorithm Incremental computation from Φ Q’ to Φ Q add( Φ Q, ) has effect only if there exists no active node in Φ Q with the same n and smaller d FOR EACH FROM Φ Q’ Deletion add( Φ Q, ) SubstitutionFOR EACH n’ FROM non-matching children of n add( Φ Q, ) Match add( Φ Q, ) (m is the matching child of n) InsertionFOR EACH m’ FROM descendents of m add( Φ Q, ) (x is the distance from m’ to m) Algorithm Details
10
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng e Incremental Computation: Example Q = e e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance ε 0 e1 ex2 s1 sa2 Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s e0 ε mat e ex1 ε ins x exa2 ε Ins xa exe2 ε Ins xe Prefix# OpBaseOpPrefix# OpBaseOp ε 1 ε del e Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s e0 ε mat e 1 10 1 22 e2edel e ex2esub e/xex3 del e exa3exsub e/a exe2exmat e s2sdel e sa2ssub e/asa3 del e Active nodes for Q = ε Active nodes for Q = e 2
11
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Incremental Computation: Discussion Insertions Needed after matches Not needed after deletions and substitutions deletions and insertions do not co-occur in adjacent positions adjacent substitutions and insertions are interchangeable Correctness and Completeness Can be proved by reducing from/to edit-distance computation
12
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Outline Step 1 Incremental fuzzy prefix matching Step 2 Multi-prefix intersection methods Cache-based prefix intersection
13
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection Q = vldb li Multi-prefix intersection To return records such that each record has all query keywords as prefixes (or their similar forms) IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… 6VLDB Lin data… 8Li VLDB…
14
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection: Method 1 IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… d a t a $ l i nu $ u $ v l d b $ 12361236 5 4 678678 $ 346346 i s $ 1818 $ 4 1 3 4 5 6 8 6 7 8 li vldb 6 8 Q = vldb li Space costInverted index Time costUnion + intersection
15
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection: Method 2 Forward List 1 2 1 1 3 3 5 6 4 1 3 7 7 2 7 d a t a $ l i nu $ u $ v l d b $ 12361236 5 4 678678 $ 346346 i s $ 1818 $ 4 IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… [1, 7] [1, 1] [2, 6] [2, 4] 1 2 34 5 67 [3, 3][4, 4] [5, 6] [6, 6] [7, 7] Q = vldb li 678678 [2, 4] Read eachVerify/Probe 6VLDB Lin data…1 3 7 8Li VLDB…2 7 Space costInverted + forward index Time costProbing forward lists
16
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Cache-Based Prefix Intersection Example Q 1 = cs co Q 2 = cs conf Q 3 = cs conf vanc Simple Method computing and caching the entire answers of Q 1 computing the answers of Q 2 from Q 1 incrementally Problem answers of Q1 may be very large and not completely used in the results displayed
17
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Cache-Based Prefix Intersection Reducing cached answers compute and cache only needed answers for subsequent queries, compute the answers: from the cached answers from resuming previously terminated computation Q = cs co cached answers of cs co traversal list: inverted list of cs compute Q = cs conf Verify cached answers of cs conf Compute
18
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Additional Features Ranking Highlighting Synonyms
19
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results Computing similar prefixes
20
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results Multi-prefix intersection
21
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results Overall scalability
22
Questions ? Thank You! Questions? UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng TASTIER: Efficient Auto-Completion, Type-Ahead Search http://tastier.ics.uci.edu/
23
UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.