Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Chapter 5: Introduction to Information Retrieval
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
IR Models: Overview, Boolean, and Vector
Search Engines and Information Retrieval
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
Vector Space Model CS 652 Information Extraction and Integration.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Chapter 5: Information Retrieval and Web Search
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Search Engines and Information Retrieval Chapter 1.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 6: Information Retrieval and Web Search
CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Search Engines By: Faruq Hasan.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Vector Space Models.
1 A Formal Study of Information Retrieval Heuristics Hui Fang, Tao Tao and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Fast Indexes and Algorithms For Set Similarity Selection Queries M. Hadjieleftheriou A.Chandel N. Koudas D. Srivastava.
Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Web News Sentence Searching Using Linguistic Graph Similarity
Data Integration for Relational Web
Keyword Searching and Browsing in Databases using BANKS
Data Mining Chapter 6 Search Engines
Structure and Content Scoring for XML
6. Implementation of Vector-Space Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Structure and Content Scoring for XML
Information Retrieval and Web Design
VECTOR SPACE MODEL Its Applications and implementations
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.)

Effective Keyword Search in Relational Databases Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Why keyword search in relational databases? We want to search text data in relational databases SQL with the “ contains ” operator is not for non-expert users Keyword search is tremendous successful in text database by ranking documents based on similarity. It is for non-expert users SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Text data in relational databases SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Suppose a user is looking for albums titled “ off the wall ” SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Keyword search is very successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples. So, let ’ s do keyword search in relational databases! ( DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects) SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Let ’ s do it, but how? What are answers to be ranked? How should we rank these answers? SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction -- an answer An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q. SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction Use a slightly modified algorithm [DISCOVER] to produce all answers for a given query. SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction: Ranking Our focus is on the effectiveness problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked. SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction: Contributions We identify four new factors that are critical to effective ranking and we propose a new ranking strategy Design and conduct comprehensive experiments for the effectiveness problem Experimental results show our strategy is significantly better than existing works in effectiveness SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

3.3 IR Ranking Q=(k 1, k 2,..,k n ), D is a document, Sim(Q,D) is the ranking score of D. tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2 1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8 SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Tuple Tree Size Normalization # of tuples in a tuple tree T SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Document Length Normalization Reconsidered Document length of D i Average Document length of the text column of D i SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Document Frequency Normalization SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D 1,D 2,..D n ) maxWgt is the maximum weight(k, D i ) sumWgt is the sum of weight(k, D i ) SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy T=(D 1,D 2,..D n ), so Sim(Q,D)  Sim(Q,T) SIGMOD 2006: Effective Keyword Search in Relational Databases

Our Ranking Strategy Schema Terms in Query lyrics for How come by D12 lusher the singer's lyrics to burn Phrase-based Ranking Using position information to boast phrase matching Concept-based Ranking Can improve effectiveness Can assign semantics to answers SIGMOD 2006: Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

Experiments – data set A Lyrics Database 50 Queries from an AOL query log Relevance Judgment: pooling + logs

Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under this skin avril lavigne lyrics

Experiments – measure Reciprocal rank: measures how good the system is to return the first relevant answer. MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.

Experiments – results Our ranking strategy: the four new factors.

Experiments – results Comparison with related works

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work Effective Keyword Search in Relational Databases SIGMOD 2006: Effective Keyword Search in Relational Databases

Conclusions Effectiveness is as important as efficiency The four new factors are critical to search effectiveness Our strategy is significantly more effective than related works SIGMOD 2006: Effective Keyword Search in Relational Databases

Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets SIGMOD 2006: Effective Keyword Search in Relational Databases

Questions ? SIGMOD 2006: Effective Keyword Search in Relational Databases