Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences.

Slides:



Advertisements
Similar presentations
Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
Advertisements

Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
What is a Database By: Cristian Dubon.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Database Systems More SQL Database Design -- More SQL1.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of.
Databases & Data Warehouses Chapter 3 Database Processing.
Structured Query Language (SQL) A2 Teacher Up skilling LECTURE 2.
Databases and LINQ Visual Basic 2010 How to Program 1.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Querying Structured Text in an XML Database By Xuemei Luo.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
1 Chapter 4: Creating Simple Queries 4.1 Introduction to the Query Task 4.2 Selecting Columns and Filtering Rows 4.3 Creating New Columns with an Expression.
CMPT 258 Database Systems Relational Algebra (Chapter 4)
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
IN THE NAME OF GOD. Reference Citing Software.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Relational Algebra p BIT DBMS II.
QUERY CONSTRUCTION CS1100: Data, Databases, and Queries CS1100Microsoft Access1.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Presented by: Dardan Xhymshiti Fall  Authors: Eli Cortez, Philip A.Bernstein, Yeye He, Lev Novik (Microsoft Corporation)  Conference: VLDB  Type:
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Keyword Searching and Browsing in Databases using BANKS
Contents Preface I Introduction Lesson Objectives I-2
Presentation transcript:

Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences Department University of Wisconsin-Madison {ericc, baid, xchai, anhai,

Contents Motivation Query Forms Generating forms Keyword Search for Forms Displaying Returned Forms Experimental Analysis Related Work and References

Traditional Access Methods for Databases Advantages: high-quality results Disadvantages: – Query languages: long learning curves – Schemas: Complex Small user population “T he usability of a database is as important as its capability” Relational/XML Databases are structured or semi-structured, with rich meta-data Typically accessed by structured query languages: SQL

Motivation Information discovery in databases requires:  Knowledge of schema  Knowledge of a query language (Example: SQL) Challenges? Hard for users uncomfortable with a formal query language.

Motivation What is the solution? Form Based Interfaces and Keyword Search Approach User submits keyword query System returns ranked list of relevant forms User selects one of forms and builds structured query

Relational Schema of DBLife Entity tables: person(id, name, homepage, title, group,organization, country) publication(id, name, booktitle, year, pages, cites, clink, link) topic(id, name) organization(id, name) conference(id, name)

Relationship Tables related_people(rid, pid1, pid2, strength) related_topic(rid, pid, tid, strength) related_organization(rid, pid, oid, strength) give_tutorial(rid, pid, cid) give_conf_talk(rid, pid, cid) give_org_talk(rid, pid, oid) serve_conf(rid, pid, cid, assignment) write_pub(rid, pid, pub_id, position) co_author(rid, pid1, pid2, strength)

Query Forms Interface for a query template. Example: Completed form over the person relation of DBLife.

Query represented is SELECT * FROM person WHERE organization = ‘Microsoft Research’ General template for the above form SELECT * FROM person WHERE name op value AND homepage op value AND title op value AND group op value AND organization op value AND country op value

How to generate forms? Step 1: Specify a subset of SQL as the target language to implement the queries supported by forms.  SQL’

SQL’: Let B = (SELECT select-list FROM from-list WHERE qualification [GROUP BY grouping-list HAVING group-qualification] UNION | INTERSECT) Note: Nested queries are not allowed in FROM and WHERE clauses.

Step 2: Determine set of skeleton templates specifying the main clauses and join conditions based on chosen subset of SQL and S D. Let R i be a relation following a relation schema S i ∈ S D Case 1: If R i does not reference other relations with foreign keys. SELECT * FROM R i WHERE predicate-list Case 2: If R i references other relations with foreign keys. SELECT * FROM WHERE

Example: Relation : Give_Tutorial give_tutorial(rid,pid,cid) Relations Referenced: Person and Conference person(id,name,homepage,title,group,organization,country) conference(id,name) Skeleton Template: SELECT *FROM give_tutorial t, person p, conference c WHERE t.pid = p.id AND t.cid = c.id AND p.name op expr AND … AND c.name op expr

Step 3: Finalize templates by modifying skeleton templates based on form specificity. How specific or general we want the forms to be? Form Specificity Form ComplexityData Specificity

Initial State of the form Adjusting form specificity:  Increase its complexity by adding more parameters.  Decrease its complexity by removing parameters.  Increase data specificity by binding more existing parameters to constants.  Decrease data specificity by unbinding parameters with fixed vales.

Approach followed in this paper: To adjust Form Complexity Divide SQL’ into 4 query classes: SELECT: basic SELECT-FROM-WHERE construct AGGR: SELECT with aggregation GROUP: AGGR with GROUP BY and HAVING clauses UNION-INTERSECT: a UNION or INTERSECT of two SELECT To adjust Data Specificity Bind “value” fields of the “attr op value” predicates in the WHERE clause to data values.

Step 4: Map each template to a form Standard form components: Label Drop down list Input box Button

Keyword Search for Forms Basic Idea Used to find relevant forms which are used to pose structured queries. Basic Approach Naïve AND Returns forms containing all the terms from keyword query. Naïve OR Some forms would be returned if the query includes at least one term. Drawback? Keyword query must have schema term(s).

Approaches proposed in this paper: Check whether data terms from user query appear in database. If yes, modify query with relevant schema terms. Double Index OR Evaluation done using OR semantics. Double Index AND Evaluation done using AND semantics.

Example: Information Need: For which conferences a researcher named “Widom” has served on program committee. Keyword Query: “Widom Conference” Here, Data term = “Widom” Schema term = “Conference” Results obtained: Naïve AND - No forms returned as “Widom” does not appear on any form. Naïve OR - Ignores “Widom” and returns all forms that contain “Conference” DI OR – Rewritten query will be “Widom person conference” as “Widom” appears in person table and evaluated with OR semantics. DI AND - Two queries generated “person conference” and “widom conference”,evaluated with AND semantics and union of results returned. DB Life person(id, name, homepage, title, group,organization, country) conference(id, name)

Double Index OR Implementation Indexes Used: DataIndex- Inputs a data term and returns a set of pairs. FormIndex-Inputs a term and returns a set of form-ids. Input- Keyword Query Output- Set of form-id’s. Step 1: Probe DataIndex with each query term q i in a query Q. If qi is a data term, DataIndex will return a set of pairs. Add each table to the set FormTerms. Add q i to FormTerms. Step 2: Probe FormIndex with terms in FormTerms. Return form containing at least one of these terms.

DI OR Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {} // Replace any data terms with table names for each qi ∈ Q if DataIndex(qi) returns pairs Add each table to FormTerms Add qi to FormTerms // qi could be a form term // Get form-ids based on FormTerms FormIndex(FormTerms) => F’ // OR semantics return F’

Double Index AND Generating all possible queries that result from replacing user supplied data terms with schema terms. Use AND semantics and return union of query results. Problem? Performing AND query with all the terms in FormTerms is wrong. Why is this so? Data term may appear in multiple unrelated tables such that no form would contain all these tables. Concept of Bucket For query “q1 AND q2” : “a ∈ S q1 AND b ∈ S q2,” where S qi is a “bucket” containing the form terms associated with q i, and a and b are two form terms from S q1 and S q2 correspondingly.

Double Index AND Implementation Input- Keyword query. Output- Set of form-id’s. Step 1: For each q i, initially bucket S qi is empty. If the query contains data terms, DataIndex will return pairs. For each table, add table to S qi and FormTerms. Add q i to S qi and FormTerms Step 2: Generate and add to SQ’ all distinct queries, each of which taking one term from each S qi. For each query in SQ’, probe the FormIndex and retrieve forms that have all terms in query.

DI AND Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {} // Replace any data terms with table names for each qi ∈ Q Sqi = {} // Bucket for qi if DataIndex(qi) returns pairs for each table if table ∉ FormTerms Add table to Sqi and FormTerms if qi ∉ FormTerms Add qi to Sqi and FormTerms // Get form-ids based on Sqi SQ’ = EnumQueries( ∀ Sqi) // Enumerate all unique queries, // each having one term from each Sqi for each Q’ ∈ SQ’ FormIndex(Q’) => F’ // A.D semantics on FormIndex return F’

Example: User wants to search for a person “John Doe” “John Doe” is present in person table but is not involved in any relationship. What will be the output? {Forms from person table + Forms from tables which reference person} will be returned. User Action: User tries to enter “John Doe” in the field name in a form which is join of say person and conference tables. Output? No results returned > DEAD FORMS

Double Index Join Used to perform a check to see if a form will return an answer if instantiated with data terms in the user query. How is the check performed? Step 1: Given keyword query Q, probe DataIndex with each query term q i. When q i is a data term that leads to set of pairs, look up each table T in a schema graph for S D and find reference tables that reference T. For each reference table, check to see if it contains any tuple-id of T. If No, retrieve the forms that contain both T and refTable and record these “dead” forms in say X. Step 2: Return F’ – X. This filters the dead forms.

DI Join Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {}, X = {} for each qi ∈ Q Sqi = {} if DataIndex(qi) returns pairs for each table T let I be the set of tuple-ids from T if T ∉ FormTerms Add T to Sqi and FormTerms SchemaGraph(T) returns refTables for each refTable if DataIndex(refTable:tid) is NULL for every tid ∈ I FormIndex(T AND refTable) => X if qi ∉ FormTerms Add qi to Sqi and FormTerms // Get form-ids based on form terms SQ’ = EnumQueries( ∀ Sqi) for each Q’ ∈ SQ’ FormIndex(Q’) => F’ return F’ – X

Displaying Returned Forms How are the returned forms ranked? Based on scoring function of Lucene index. Lucene score for a query Q and a document D is: score(Q,D) = coord(Q,D) * queryNorm(Q) * Σt in Q( tf(t in D) * idf(t)2 * t.getBoost() * norm(t,D) )

Problem? “Sister Forms” Illustration: User query – “Widom” Result of the query : Impossible to find what user is looking for.

What is the solution? Grouping Forms: Approach 1: Group consecutive sister forms with same score-  first level groups Group forms by the four query classes Display the classes in the order of SELECT, AGGR, GROUP, and UNION-INTERSECT. Result of “Widom” query: Problem? Non-consecutive sister forms join different first level groups having the same description.

Solution? Approach 2: First group the returned forms by their table. Order the groups by the sum of their scores. Advantage  No repetition

Experimental Analysis Experimental Setup Data set-DBLife Generated set of forms F1 14 skeleton templates, one for each of 5 Entity tables and 9 Relationship tables Created templates-1 SELECT, 5 AGGR,6 GROUP, 2 UNION-INTERSECT, so F1 had 196 forms. Real life user study was done with 7 graduate students who found answers for 6 information needs.

Experimental Analysis Comparing Naïve, Double-Index, and Double-Index-Join Ranking and Displaying Forms Which is the best approach? Why? Let’s find out.

Related Work and References Jayapandian[11] proposed automatic form generation for a database based on a sample query workload. [11] M. Jayapandian, H. V. Jagadish. Automating the Design and Construction of Query Forms. ICDE 2006 Liu [14] proposed to automatically distinguish between schema terms and value terms in keyword query. [14] F. Liu, C. Yu, W. Meng, A. Chowdhury. Effective Keyword Search in Relational Databases. SIGMOD 2006 BANKS[3] proposed supporting the “attribute = value” construct in keyword queries. [3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. ICDE, Luo [16] proposed to detect empty result queries by “remembering” results from previously executed empty results queries. [16] G. Luo. Efficient Detection of Empty-Result Queries. VLDB 2006.

Thank You!