Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Similar presentations


Presentation on theme: "Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng."— Presentation transcript:

1

2 Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.)

3 Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases

4 Introduction Why keyword search in relational databases? We want to search text data in relational databases SQL with the “ contains ” operator is not for non-expert users Keyword search is tremendous successful in text database by ranking documents based on similarity. It is for non-expert users SIGMOD 2006: Effective Keyword Search in Relational Databases

5 Introduction Text data in relational databases SIGMOD 2006: Effective Keyword Search in Relational Databases

6 Introduction Suppose a user is looking for albums titled “ off the wall ” SIGMOD 2006: Effective Keyword Search in Relational Databases

7 Introduction Keyword search is very successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples. So, let ’ s do keyword search in relational databases! ( DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects) SIGMOD 2006: Effective Keyword Search in Relational Databases

8 Introduction Let ’ s do it, but how? What are answers to be ranked? How should we rank these answers? SIGMOD 2006: Effective Keyword Search in Relational Databases

9 Introduction -- an answer An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q. SIGMOD 2006: Effective Keyword Search in Relational Databases

10 Introduction Use a slightly modified algorithm [DISCOVER] to produce all answers for a given query. SIGMOD 2006: Effective Keyword Search in Relational Databases

11 Introduction: Ranking Our focus is on the effectiveness problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked. SIGMOD 2006: Effective Keyword Search in Relational Databases

12 Introduction: Contributions We identify four new factors that are critical to effective ranking and we propose a new ranking strategy Design and conduct comprehensive experiments for the effectiveness problem Experimental results show our strategy is significantly better than existing works in effectiveness SIGMOD 2006: Effective Keyword Search in Relational Databases

13 Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

14 3.3 IR Ranking Q=(k 1, k 2,..,k n ), D is a document, Sim(Q,D) is the ranking score of D. tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 SIGMOD 2006: Effective Keyword Search in Relational Databases

15 Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

16 Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

17 Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

18 Our Ranking Strategy Tuple Tree Size Normalization # of tuples in a tuple tree T SIGMOD 2006: Effective Keyword Search in Relational Databases

19 Our Ranking Strategy Document Length Normalization Reconsidered Document length of D i Average Document length of the text column of D i SIGMOD 2006: Effective Keyword Search in Relational Databases

20 Our Ranking Strategy Document Frequency Normalization SIGMOD 2006: Effective Keyword Search in Relational Databases

21 Our Ranking Strategy T=(D 1,D 2,..D n ) maxWgt is the maximum weight(k, D i ) sumWgt is the sum of weight(k, D i ) SIGMOD 2006: Effective Keyword Search in Relational Databases

22 Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

23 Our Ranking Strategy Schema Terms in Query lyrics for How come by D12 lusher the singer's lyrics to burn Phrase-based Ranking Using position information to boast phrase matching Concept-based Ranking Can improve effectiveness Can assign semantics to answers SIGMOD 2006: Effective Keyword Search in Relational Databases

24 Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

25 Experiments – data set A Lyrics Database 50 Queries from an AOL query log Relevance Judgment: pooling + logs

26 Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under this skin avril lavigne lyrics

27 Experiments – measure Reciprocal rank: measures how good the system is to return the first relevant answer. MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.

28 Experiments – results Our ranking strategy: the four new factors.

29 Experiments – results Comparison with related works

30 Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

31 Conclusions Effectiveness is as important as efficiency The four new factors are critical to search effectiveness Our strategy is significantly more effective than related works SIGMOD 2006: Effective Keyword Search in Relational Databases

32 Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets SIGMOD 2006: Effective Keyword Search in Relational Databases

33 Questions ? SIGMOD 2006: Effective Keyword Search in Relational Databases


Download ppt "Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng."

Similar presentations


Ads by Google