Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.

Similar presentations


Presentation on theme: "DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer."— Presentation transcript:

1 DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

2 How to Access Databases? Traditional database-access methods: – SQL Select title, author, booktitle, year From dblp Where title Contains “search” And booktitle Contains “cidr” – Query-by-exmaple (Form) – Keyword Search “search cidr” CIDR'11 - DBease (2)(2) cidr

3 Comparison of Different Methods CIDR'11 - DBease (3)(3) Usability

4 Too many results! Keyword Search Is traditional keyword search good enough? CIDR'11 - DBease No result! (4)(4)

5 Form-based Search Form-based Search has the same problem. CIDR'11 - DBease Complicated and still no result! (5)(5)

6 Our Solution CIDR'11 - DBease (6)(6) Type-Ahead Search Type-Ahead Search in Forms SQL Suggestion Usability

7 What is Type-Ahead Search? CIDR'11 - DBease (7)(7)

8 Type-Ahead Search Advantages – On-the-fly giving users instant feedback – Helping users navigate the underlying data – Tolerating inconsistencies between query and data – Supporting Synonyms – Supporting XML data – Supporting Multiple tables CIDR'11 - DBease (8)(8)

9 Problem Formulation Data: A set of records Query – Q = {p 1, p 2, …, p l }: a set of prefixes – δ: Edit-distance threshold Result – A set of records having all query prefixes or their similar forms (conjunctive) CIDR'11 - DBease Edit Distance: The number of edit operations (insertion, deletion, substitution) transformed a string to another ed(string, stang) =2 (9)(9)

10 Indexing Trie Index Words: root to leaves Inverted lists on leaves CIDR'11 - DBease (10)

11 (11) CIDR'11 - DBease Algorithm Step 1: Find similar prefixes incrementally Step 2: Retrieve the leaf nodes of similar prefixes Step 3: Compute union lists of inverted lists of leaf nodes Step 4: Intersect the union lists of query keywords =cid r

12 Type-Ahead Search in Forms CIDR'11 - DBease (12) Type-Ahead Search Type-Ahead Search in Forms Usability

13 What is Type-Ahead Search in Forms? CIDR'11 - DBease (13)

14 Type-Ahead Search in Forms Problem Formulation – Data: A relation with multiple attributes – Query: A set of prefixes on attributes in a form interface – Answers: Local results of the focused attribute Global results of the relation Advantages – On-the-fly Faceted Search – Supporting Aggregation CIDR'11 - DBease (14)

15 Data Partition Global Table  Local Tables CIDR'11 - DBease (15) IDTitleConf.Author 1xml databaseVLDBalbert 2xml databaseSIGMODbob 3xml searchVLDBalbert 4xml securityVLDBalice 5rdbmsSIGMODcharlie IDTitle T1xml database T2xml search T3xml security T4rdbms IDConf. C1VLDB C2SIGMOD IDAuthor A1albert A2bob A3alice A4charlie

16 Indexing Each attribute – Trie – Mapping Tables Local  Global Global  Local CIDR'11 - DBease (16)

17 Our Solution CIDR'11 - DBease (17)

18 Author: xml Title: albert alice xml database, albert xml search, albert xml security, alice al Our Solution CIDR'11 - DBease (18) l b e r i c 5: alice 4: albert e T1 Trie 1,2 T23 T34 T45 L-G Mapping Table 1T1 2T1 3T2 4T3 G-L Mapping Table 5T4 a a

19 SQL Suggestion CIDR'11 - DBease (19) Type-Ahead Search Type-Ahead Search in Forms SQL Suggestion Usability

20 What is SQL Suggestion? CIDR'11 - DBease (20)

21 SQL Suggestion Problem Formulation – Data: A database with multiple tables – Query: A set of keywords – Answers: Relevant SQL queries Advantages – Suggest SQL queries based on keywords – Help users formulate SQL queries to find accurate results – Designed for both SQL programmers and Internet users – Group answers based on SQL structures – Support Aggregation – Support Range queries CIDR'11 - DBease (21)

22 Our Solution Suggest Templates from Keywords – A template is a structure in the databases – Modeled as a graph Nodes: entities (table names or attribute names) Edges: foreign keys or membership Suggest SQL queries from Templates – Mapping between keywords and templates CIDR'11 - DBease (22) keyword paper ir (a) Query (b) Template (c) SQL

23 Template Suggestion Template Generation – Extension from basic entities (tables) Template Ranking – Template weight Pagerank – Relevancy between a keyword and an entity Tf*idf Algorithms – Fagin algorithms – Threshold-based pruning techniques CIDR'11 - DBease (23)

24 SQL Suggestion SQL suggestion model – Mapping from keywords to templates – Matching is a set of mappings with all keywords – Weighted set-covering problem (NP-hard) SQL ranking – Relevancy between keywords and attributes – Attribute weight Algorithms – Greedy algorithms CIDR'11 - DBease (24)

25 Search: dbease http://dbease.cs.tsinghua.edu.cn Keyword Search: http://dbease.cs.tsinghua.edu.cn/ipubmed/ http://dbease.cs.tsinghua.edu.cn/ipubmed/ http://dbease.cs.tsinghua.edu.cn/dblpsearch/ Form-based Search: http://dbease.cs.tsinghua.edu.cn/seaform/ SQL: http://dbease.cs.tsinghua.edu.cn/sqlsugg/

26

27 Differences to Google Instant Search Fuzzy prefix matching Google firstly predicts queries, and then use the top queries to search the documents. Google may involve false negatives, while we can find the accurate top-k answers. CIDR'11 - DBease (27)

28 Differences to Complete Search Fuzzy prefix matching Different index structures More efficient CIDR'11 - DBease (28)

29 Differences to Keyword Search Effectiveness – SQL Suggestion supports range queries, and aggregation functions. – SQL Suggestion can group answers. – SQL Suggestion can help users to express their query intent more accurately. Efficiency – Faster CIDR'11 - DBease (29)


Download ppt "DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer."

Similar presentations


Ads by Google