Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.

Similar presentations


Presentation on theme: "Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University."— Presentation transcript:

1 Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University

2 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Too many results! Traditional Keyword Search No result! Complicated and still no result!

3 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Interactive Fuzzy Keyword Search Features:  Interactive: data exploration  Fuzzy: error tolerant  Multiple keywords: search on-the- fly

4 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Fundamentals  Data  R: a set of records  W: a set of distinct words  Query  Q = {p 1, p 2, …, p l }: a set of prefixes  δ: Edit-distance threshold  Query result  R Q : a set of records such that each record has all query prefixes or their similar forms (conjunctive)

5 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Contributions / Outline  Step 1  Incremental fuzzy prefix matching  Step 2  Multi-prefix intersection methods  Cache-based prefix intersection

6 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Observation  W = {exam, example, exemplar, exempt, sample}  δ = 2 PrefixDistance exam2 examp1 exampl0 example1 exemp2 exempt2 exempl1 exempla2 sampl2 PrefixDistance examp2 exampl1 example0 exempl2 exempla2 sample2 delete e match e delete e substitute e with a match e Q’ = examplQ = example

7 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Trie Indexing Computing set of active nodes Φ Q  Initialization  Incremental step e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance examp2 exampl1 example0 exempl2 exempla2 sample2 Active nodes for Q = example e 2 1 0 2 2 2

8 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Initialization  Q = ε e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance 0 11 22 PrefixDistance 0 e1 ex2 s1 sa2 PrefixDistance ε 0 Initializing Φ ε with all nodes within in depth of δ e

9 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Incremental Computation: Algorithm  Incremental computation from Φ Q’ to Φ Q  add( Φ Q, ) has effect only if there exists no active node in Φ Q with the same n and smaller d FOR EACH FROM Φ Q’ Deletion add( Φ Q, ) SubstitutionFOR EACH n’ FROM non-matching children of n add( Φ Q, ) Match add( Φ Q, ) (m is the matching child of n) InsertionFOR EACH m’ FROM descendents of m add( Φ Q, ) (x is the distance from m’ to m) Algorithm Details

10 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng e Incremental Computation: Example  Q = e e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance ε 0 e1 ex2 s1 sa2 Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s e0 ε mat e ex1 ε ins x exa2 ε Ins xa exe2 ε Ins xe Prefix# OpBaseOpPrefix# OpBaseOp ε 1 ε del e Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s Prefix# OpBaseOp ε 1 ε del e s1 ε sub e/s e0 ε mat e 1 10 1 22 e2edel e ex2esub e/xex3 del e exa3exsub e/a exe2exmat e s2sdel e sa2ssub e/asa3 del e Active nodes for Q = ε Active nodes for Q = e 2

11 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Incremental Computation: Discussion  Insertions  Needed after matches  Not needed after deletions and substitutions deletions and insertions do not co-occur in adjacent positions adjacent substitutions and insertions are interchangeable  Correctness and Completeness  Can be proved by reducing from/to edit-distance computation

12 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Outline  Step 1  Incremental fuzzy prefix matching  Step 2  Multi-prefix intersection methods  Cache-based prefix intersection

13 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection  Q = vldb li  Multi-prefix intersection  To return records such that each record has all query keywords as prefixes (or their similar forms) IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… 6VLDB Lin data… 8Li VLDB…

14 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection: Method 1 IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… d a t a $ l i nu $ u $ v l d b $ 12361236 5 4 678678 $ 346346 i s $ 1818 $ 4 1 3 4 5 6 8 6 7 8 li vldb 6 8  Q = vldb li Space costInverted index Time costUnion + intersection

15 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Multi-Prefix Intersection: Method 2 Forward List 1 2 1 1 3 3 5 6 4 1 3 7 7 2 7 d a t a $ l i nu $ u $ v l d b $ 12361236 5 4 678678 $ 346346 i s $ 1818 $ 4 IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… [1, 7] [1, 1] [2, 6] [2, 4] 1 2 34 5 67 [3, 3][4, 4] [5, 6] [6, 6] [7, 7]  Q = vldb li 678678 [2, 4] Read eachVerify/Probe 6VLDB Lin data…1 3 7 8Li VLDB…2 7 Space costInverted + forward index Time costProbing forward lists

16 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Cache-Based Prefix Intersection  Example  Q 1 = cs co  Q 2 = cs conf  Q 3 = cs conf vanc  Simple Method  computing and caching the entire answers of Q 1  computing the answers of Q 2 from Q 1 incrementally  Problem  answers of Q1 may be very large and not completely used in the results displayed

17 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Cache-Based Prefix Intersection  Reducing cached answers  compute and cache only needed answers  for subsequent queries, compute the answers: from the cached answers from resuming previously terminated computation Q = cs co cached answers of cs co traversal list: inverted list of cs compute Q = cs conf Verify cached answers of cs conf Compute

18 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Additional Features  Ranking  Highlighting  Synonyms

19 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results  Computing similar prefixes

20 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results  Multi-prefix intersection

21 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Experimental Results  Overall scalability

22 Questions ? Thank You! Questions? UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng TASTIER: Efficient Auto-Completion, Type-Ahead Search http://tastier.ics.uci.edu/

23 UC Irvine & Tsinghua Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng Efficient Interactive Fuzzy Keyword Search Shengyue Ji, Guoliang Li, Chen Li, Jianhua Feng


Download ppt "Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University."

Similar presentations


Ads by Google