Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences.

Similar presentations


Presentation on theme: "Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences."— Presentation transcript:

1 Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences Department University of Wisconsin-Madison {ericc, baid, xchai, anhai, naughton} @cs.wisc.edu

2 Contents Motivation Query Forms Generating forms Keyword Search for Forms Displaying Returned Forms Experimental Analysis Related Work and References

3 Traditional Access Methods for Databases Advantages: high-quality results Disadvantages: – Query languages: long learning curves – Schemas: Complex Small user population “T he usability of a database is as important as its capability” Relational/XML Databases are structured or semi-structured, with rich meta-data Typically accessed by structured query languages: SQL

4 Motivation Information discovery in databases requires:  Knowledge of schema  Knowledge of a query language (Example: SQL) Challenges? Hard for users uncomfortable with a formal query language.

5 Motivation What is the solution? Form Based Interfaces and Keyword Search Approach User submits keyword query System returns ranked list of relevant forms User selects one of forms and builds structured query

6 Relational Schema of DBLife Entity tables: person(id, name, homepage, title, group,organization, country) publication(id, name, booktitle, year, pages, cites, clink, link) topic(id, name) organization(id, name) conference(id, name)

7 Relationship Tables related_people(rid, pid1, pid2, strength) related_topic(rid, pid, tid, strength) related_organization(rid, pid, oid, strength) give_tutorial(rid, pid, cid) give_conf_talk(rid, pid, cid) give_org_talk(rid, pid, oid) serve_conf(rid, pid, cid, assignment) write_pub(rid, pid, pub_id, position) co_author(rid, pid1, pid2, strength)

8 Query Forms Interface for a query template. Example: Completed form over the person relation of DBLife.

9 Query represented is SELECT * FROM person WHERE organization = ‘Microsoft Research’ General template for the above form SELECT * FROM person WHERE name op value AND homepage op value AND title op value AND group op value AND organization op value AND country op value

10 How to generate forms? Step 1: Specify a subset of SQL as the target language to implement the queries supported by forms.  SQL’

11 SQL’: Let B = (SELECT select-list FROM from-list WHERE qualification [GROUP BY grouping-list HAVING group-qualification] UNION | INTERSECT) Note: Nested queries are not allowed in FROM and WHERE clauses.

12 Step 2: Determine set of skeleton templates specifying the main clauses and join conditions based on chosen subset of SQL and S D. Let R i be a relation following a relation schema S i ∈ S D Case 1: If R i does not reference other relations with foreign keys. SELECT * FROM R i WHERE predicate-list Case 2: If R i references other relations with foreign keys. SELECT * FROM WHERE

13 Example: Relation : Give_Tutorial give_tutorial(rid,pid,cid) Relations Referenced: Person and Conference person(id,name,homepage,title,group,organization,country) conference(id,name) Skeleton Template: SELECT *FROM give_tutorial t, person p, conference c WHERE t.pid = p.id AND t.cid = c.id AND p.name op expr AND … AND c.name op expr

14 Step 3: Finalize templates by modifying skeleton templates based on form specificity. How specific or general we want the forms to be? Form Specificity Form ComplexityData Specificity

15 Initial State of the form Adjusting form specificity:  Increase its complexity by adding more parameters.  Decrease its complexity by removing parameters.  Increase data specificity by binding more existing parameters to constants.  Decrease data specificity by unbinding parameters with fixed vales.

16 Approach followed in this paper: To adjust Form Complexity Divide SQL’ into 4 query classes: SELECT: basic SELECT-FROM-WHERE construct AGGR: SELECT with aggregation GROUP: AGGR with GROUP BY and HAVING clauses UNION-INTERSECT: a UNION or INTERSECT of two SELECT To adjust Data Specificity Bind “value” fields of the “attr op value” predicates in the WHERE clause to data values.

17 Step 4: Map each template to a form Standard form components: Label Drop down list Input box Button

18 Keyword Search for Forms Basic Idea Used to find relevant forms which are used to pose structured queries. Basic Approach Naïve AND Returns forms containing all the terms from keyword query. Naïve OR Some forms would be returned if the query includes at least one term. Drawback? Keyword query must have schema term(s).

19 Approaches proposed in this paper: Check whether data terms from user query appear in database. If yes, modify query with relevant schema terms. Double Index OR Evaluation done using OR semantics. Double Index AND Evaluation done using AND semantics.

20 Example: Information Need: For which conferences a researcher named “Widom” has served on program committee. Keyword Query: “Widom Conference” Here, Data term = “Widom” Schema term = “Conference” Results obtained: Naïve AND - No forms returned as “Widom” does not appear on any form. Naïve OR - Ignores “Widom” and returns all forms that contain “Conference” DI OR – Rewritten query will be “Widom person conference” as “Widom” appears in person table and evaluated with OR semantics. DI AND - Two queries generated “person conference” and “widom conference”,evaluated with AND semantics and union of results returned. DB Life person(id, name, homepage, title, group,organization, country) conference(id, name)

21 Double Index OR Implementation Indexes Used: DataIndex- Inputs a data term and returns a set of pairs. FormIndex-Inputs a term and returns a set of form-ids. Input- Keyword Query Output- Set of form-id’s. Step 1: Probe DataIndex with each query term q i in a query Q. If qi is a data term, DataIndex will return a set of pairs. Add each table to the set FormTerms. Add q i to FormTerms. Step 2: Probe FormIndex with terms in FormTerms. Return form containing at least one of these terms.

22 DI OR Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {} // Replace any data terms with table names for each qi ∈ Q if DataIndex(qi) returns pairs Add each table to FormTerms Add qi to FormTerms // qi could be a form term // Get form-ids based on FormTerms FormIndex(FormTerms) => F’ // OR semantics return F’

23 Double Index AND Generating all possible queries that result from replacing user supplied data terms with schema terms. Use AND semantics and return union of query results. Problem? Performing AND query with all the terms in FormTerms is wrong. Why is this so? Data term may appear in multiple unrelated tables such that no form would contain all these tables. Concept of Bucket For query “q1 AND q2” : “a ∈ S q1 AND b ∈ S q2,” where S qi is a “bucket” containing the form terms associated with q i, and a and b are two form terms from S q1 and S q2 correspondingly.

24 Double Index AND Implementation Input- Keyword query. Output- Set of form-id’s. Step 1: For each q i, initially bucket S qi is empty. If the query contains data terms, DataIndex will return pairs. For each table, add table to S qi and FormTerms. Add q i to S qi and FormTerms Step 2: Generate and add to SQ’ all distinct queries, each of which taking one term from each S qi. For each query in SQ’, probe the FormIndex and retrieve forms that have all terms in query.

25 DI AND Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {} // Replace any data terms with table names for each qi ∈ Q Sqi = {} // Bucket for qi if DataIndex(qi) returns pairs for each table if table ∉ FormTerms Add table to Sqi and FormTerms if qi ∉ FormTerms Add qi to Sqi and FormTerms // Get form-ids based on Sqi SQ’ = EnumQueries( ∀ Sqi) // Enumerate all unique queries, // each having one term from each Sqi for each Q’ ∈ SQ’ FormIndex(Q’) => F’ // A.D semantics on FormIndex return F’

26 Example: User wants to search for a person “John Doe” “John Doe” is present in person table but is not involved in any relationship. What will be the output? {Forms from person table + Forms from tables which reference person} will be returned. User Action: User tries to enter “John Doe” in the field name in a form which is join of say person and conference tables. Output? No results returned ------ > DEAD FORMS

27 Double Index Join Used to perform a check to see if a form will return an answer if instantiated with data terms in the user query. How is the check performed? Step 1: Given keyword query Q, probe DataIndex with each query term q i. When q i is a data term that leads to set of pairs, look up each table T in a schema graph for S D and find reference tables that reference T. For each reference table, check to see if it contains any tuple-id of T. If No, retrieve the forms that contain both T and refTable and record these “dead” forms in say X. Step 2: Return F’ – X. This filters the dead forms.

28 DI Join Input: A keyword query Q = [q1 q2.... qn] Output: A set of form-ids F’ Algorithm: FormTerms = {}, F’ = {}, X = {} for each qi ∈ Q Sqi = {} if DataIndex(qi) returns pairs for each table T let I be the set of tuple-ids from T if T ∉ FormTerms Add T to Sqi and FormTerms SchemaGraph(T) returns refTables for each refTable if DataIndex(refTable:tid) is NULL for every tid ∈ I FormIndex(T AND refTable) => X if qi ∉ FormTerms Add qi to Sqi and FormTerms // Get form-ids based on form terms SQ’ = EnumQueries( ∀ Sqi) for each Q’ ∈ SQ’ FormIndex(Q’) => F’ return F’ – X

29 Displaying Returned Forms How are the returned forms ranked? Based on scoring function of Lucene index. Lucene score for a query Q and a document D is: score(Q,D) = coord(Q,D) * queryNorm(Q) * Σt in Q( tf(t in D) * idf(t)2 * t.getBoost() * norm(t,D) )

30 Problem? “Sister Forms” Illustration: User query – “Widom” Result of the query : Impossible to find what user is looking for.

31 What is the solution? Grouping Forms: Approach 1: Group consecutive sister forms with same score-  first level groups Group forms by the four query classes Display the classes in the order of SELECT, AGGR, GROUP, and UNION-INTERSECT. Result of “Widom” query: Problem? Non-consecutive sister forms join different first level groups having the same description.

32 Solution? Approach 2: First group the returned forms by their table. Order the groups by the sum of their scores. Advantage  No repetition

33 Experimental Analysis Experimental Setup Data set-DBLife Generated set of forms F1 14 skeleton templates, one for each of 5 Entity tables and 9 Relationship tables Created templates-1 SELECT, 5 AGGR,6 GROUP, 2 UNION-INTERSECT, so F1 had 196 forms. Real life user study was done with 7 graduate students who found answers for 6 information needs.

34 Experimental Analysis Comparing Naïve, Double-Index, and Double-Index-Join Ranking and Displaying Forms Which is the best approach? Why? Let’s find out.

35 Related Work and References Jayapandian[11] proposed automatic form generation for a database based on a sample query workload. [11] M. Jayapandian, H. V. Jagadish. Automating the Design and Construction of Query Forms. ICDE 2006 Liu [14] proposed to automatically distinguish between schema terms and value terms in keyword query. [14] F. Liu, C. Yu, W. Meng, A. Chowdhury. Effective Keyword Search in Relational Databases. SIGMOD 2006 BANKS[3] proposed supporting the “attribute = value” construct in keyword queries. [3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. ICDE, 2002. Luo [16] proposed to detect empty result queries by “remembering” results from previously executed empty results queries. [16] G. Luo. Efficient Detection of Empty-Result Queries. VLDB 2006.

36 Thank You!


Download ppt "Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences."

Similar presentations


Ads by Google