Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

INFORMATION SOLUTIONS Citation Analysis Reports. Copyright 2005 Thomson Scientific 2 INFORMATION SOLUTIONS Provide highly customized datasets based on.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query? Aditya Telang Sharma Chakravarthy, Chengkai Li.
1 Query-by-Example (QBE). 2 v A “GUI” for expressing queries. –Based on the Domain Relational Calulus (DRC)! –Actually invented before GUIs. –Very convenient.
Final Project of Information Retrieval and Extraction by d 吳蕙如.
Search Engines and Information Retrieval
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
3-1 Chapter 3 Data and Knowledge Management
Structured Query Language Part I Chapter Three CIS 218.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Information systems and databases Database information systems Read the textbook: Chapter 2: Information systems and databases FOR MORE INFO...
Overview of Search Engines
CORE 2: Information systems and Databases STORAGE & RETRIEVAL 2 : SEARCHING, SELECTING & SORTING.
Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences.
MS Access: Database Concepts Instructor: Vicki Weidler.
IST Databases and DBMSs Todd S. Bacastow January 2005.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
ASP.NET Programming with C# and SQL Server First Edition
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Search Engines and Information Retrieval Chapter 1.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
CS1100: Access Reports A (Very) Short Tutorial on Microsoft Access Report Construction Created By Martin Schedlbauer With contributions from Matthew Ekstrand-Abueg.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
10/31/2012ISC239 Isabelle Bichindaritz1 SQL Graphical Queries Design Query By Example.
Analyzing Data For Effective Decision Making Chapter 3.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Querying Structured Text in an XML Database By Xuemei Luo.
A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data Eric Chu, Akanksha Baid, Ting Chen, AnHai Doan, Jeffrey Naughton.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Databases: An Overview Chapter 7, Exploring the Digital Domain.
Search Engine Architecture
ITGS Databases.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Improving Information Discovery for the AGU Abstract Archive Brendan Ashby, Sherry Chen, Aris Peng, Eric Rozell, Akeem Shirley Xinformatics Spring 2012.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
Presented By Amarjit Datta
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Welcome: To the fifth learning sequence “ Data Models “ Recap : In the previous learning sequence, we discussed The Database concepts. Present learning:
Databases and DBMSs Todd S. Bacastow January
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Query-by-Example (QBE)
Search Engine Architecture
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
The Relational Model Textbook /7/2018.
Introduction to Information Retrieval
Search Engine Architecture
The ultimate in data organization
Information Retrieval and Web Design
Presentation transcript:

Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of Wisconsin-Madison

More and more untrained users querying DBMSs E-commerce applications Structured wikipedia (e.g., DBpedia) Increasing demand for richer queries  Attr = value, range, sorting, aggregation, etc. Writing SQL queries doesn’t work  Need to know SQL and the schema SIGMOD

Keyword Search over Databases Input: Keywords Output: Ranked list of joined-tuples containing the keywords Has made much progress in recent years Disadvantage: limited query expressiveness  Field selection  Range queries  Aggregation SIGMOD

Augmenting Keyword Search Augment keyword search with new constructs Field selection not so bad  “Attr = value”  But users need to know field names Now consider adding range queries, aggregation… Keyword search becomes a new language for a subset of SQL SIGMOD

Building Query with Forms Is Simple SIGMOD “Finding publications by UW-Madison researchers who are originally from Greece”

Making Forms Support Many Queries Arbitrarily customizable  Users can select tables, columns, values… Example: QBE  Filling in values in skeleton tables Ultimately it’s close to asking them to generate SQL again SIGMOD

So, we need more-specific forms Forms that are closer to the user intent But we would need many forms to support many queries How do we get the correct form to a user? SIGMOD

Our Approach Combining keyword search and query forms: Offline:  Generate and index (potentially many) forms At query time: 1. User submits keyword query 2. System returns relevant forms 3. User selects desired form to finish query SIGMOD

Challenges Form generation  How specific/general should forms be?  How to systematically generate forms and good form descriptions? Keyword search over forms  What makes forms different from documents?  What issues arise in retrieval and ranking?  Do users find it useful? SIGMOD

Challenges Form generation  How specific/general should forms be?  How to systematically generate forms and good form descriptions? Keyword search over forms  What makes forms different from documents?  What issues arise in retrieval and ranking?  Do users find it useful? SIGMOD

Forms VS Documents Forms have parameters Query keywords could be  Terms on a form  Parameter values Not part of the query until users specify them SIGMOD

“Naïve” Keyword Search Query: “author Madison”  Author => a term on a form  Madison => a data value Naïve-AND: return forms with ALL keywords  No results Naïve-OR: Return forms with ANY keywords  “Madison” is ignored Put data values on forms  High storage and maintenance costs SIGMOD

Solution: Query Rewrite If query Q contains data value d and d is in relation R, rewrite Q to consider R “author Madison”  Madison is in tables conference, publication, … Alternatives  DI-OR  DI-AND  DIJ SIGMOD

DI-OR: Query rewrite with OR semantics DI-OR  Create Q’ = Q + R  Then search for forms with Q’ using OR- semantics Example  Q: “author Madison”  Q’: “author Madison conference publication” Handles terms that refer to data values Results often too inclusive SIGMOD

DI-AND: Query rewrite with AND semantics Example  Q: “Eric Madison”  “Eric” => author  “Madison” => conference, publication Enumerate new queries using AND semantics:  “author AND conference”  “author AND publication” SIGMOD

“Dead” Forms Some returned forms give empty results with respect to the keywords Example: a table referenced by many  Person (id, name, …)  Tutorial(rid, pid, cid)  ConferenceTalk(rid, pid, cid)  ServeConf(rid, pid, …)  WritePub(rid, pid, …) “Eric” => forms for all 5 tables  But Eric has only written a paper… SIGMOD

DIJ: Filtering “Dead” Forms Example  “Eric” => Table = Person, PID = P1 On forms having Person table  Check if other tables referencing Person have tuples with PID = 1  WritePub(W7, P1, …) Return forms for Person and WritePub tables SIGMOD

Ranking Using only Lucene’s TF-IDF function is not good enough Many similar forms  Similar form summaries  For a given query, similar forms often have same ranking scores  When query is not very specific, the best form may be hidden in a bunch of logically similar forms SIGMOD

Returning a flat list of forms is unclear SIGMOD The query “dewitt” returns a list of 210 forms

Presenting groups of forms in the results 1 st -level group: consecutive forms having the same score and based on the same relation. In each first-level group, group forms by the types of queries they support  Select-From-Where, Aggregation, Union/Intersect Display 2 nd level groups of forms in fixed order  Forms in the same 1 st level group have the same ranking scores. SIGMOD

Returning a flat list of forms is unclear SIGMOD Instead of showing a flat list of 210 forms

Returning groups of forms SIGMOD Shows 23 groups of logically similar forms Users can drill down a “right” group to find the “right” form

User Study Data Set: DBLife  5 entity tables, 9 relationship tables, 196 forms 7 CS grad students 6 information needs Alternatives  Naive-OR, Naïve-AND, DI-OR, DI-AND, DIJ Observing  # forms returned  Rank of “right” form  Time SIGMOD

Information Needs 1. Find all people who have given a tutorial at VLDB. 2. Find topics of areas related to Jeff Naughton. 3. Find people who have served as the SIGMOD PC chair. 4. Find the first author of all papers cited more than 5 times. (Range query) 5. Find the number of people who have co-authored a paper with David Dewitt. (Count query) 6. Find people who have published with David DeWitt or Jeff Naughton (Union query) SIGMOD

Queries of a User Q1: “tutorial vldb” Q2:“jeff naughton research area” Q3:“sigmod chair” (data terms only) Q4:“paper citation” Q5:“david dewitt coauthor” Q6:“dewitt naughton” (data terms only) SIGMOD

Comparing # Forms Returned SIGMOD Number of Forms Returned Naive-OR Naive- And DI-ORDI-ANDDIJ Q Q Q Q Q Q

DI-AND VS DIJ on # Forms Returned SIGMOD Average Number of Forms Returned T1T2T3T4T5T6 DI-AND DIJ DIJ eliminates dead forms # dead forms depends on the specific schema and query

Flat VS Group Ranks Highest, median, and lowest based on 7 users SIGMOD Flat RankGroup Rank HML #F HML #G T T T T T T

Breakdown of End-to-End Time Time by 7 users on 6 information needs SIGMOD Pose query (sec) Find the right form (sec) Fill out the form (sec) Total average time (sec) Standard Deviation (sec) Median (sec) T T T T T T

Conclusion Help untrained users pose wide variety of structured queries  Keyword search => forms Generating forms for wide variety of queries Keyword search of forms  Query rewrite to handle parameter values Presenting forms in groups Many issues should be further explored SIGMOD