DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.

Slides:



Advertisements
Similar presentations
Chen Li ( 李晨 ) Chen Li Search As You Type Joint work with colleagues at UCI and Tsinghua.
Advertisements

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Jiannan Wang (Tsinghua, China) Guoliang Li (Tsinghua, China) Jianhua Feng (Tsinghua, China)
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Evaluating Search Engine
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach Guoliang Li 1, Shengyue Ji 2, Chen Li 2, Jianhua Feng 1 1 Tsinghua University, Beijing,
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Fuzzy Multi-Dimensional Search in the Wayfinder File System Christopher Peery, Wei Wang, Amélie Marian, Thu D. Nguyen Computer Science Department, Rutgers.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 Notes 06: Efficient Fuzzy Search Professor Chen Li Department of Computer Science UC Irvine CS122B: Projects in Databases and Web Applications Spring.
Modern Information Retrieval Chapter 4 Query Languages.
Information Retrieval
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.
Overview of Search Engines
Text Search and Fuzzy Matching
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Search Engines and Information Retrieval Chapter 1.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Experiments An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints Entity Extraction A Document An Efficient Filter.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Experiments Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction Entity Extraction A Document An Efficient Filter.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
XML and Database.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
Windows 7 WampServer 2.1 MySQL PHP 5.3 Script Apache Server User Record or Select Media Upload to Internet Return URL Forward URL Create.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Presented by: Shahab Helmi Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Jiannan Wang (Tsinghua, China) Guoliang Li (Tsinghua, China)
A research literature search engine with abbreviation recognition
Top-k String Similarity Search with Edit-Distance Constraints
Supporting of search-as-you-type using sql in databases
Answering Cross-Source Keyword Queries Over Biological Data Sources
Information Retrieval and Web Design
Introduction to XML IR XML Group.
Presentation transcript:

DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer Science and Technology, Tsinghua University, Beijing , China

How to Access Databases? Traditional database-access methods: – SQL Select title, author, booktitle, year From dblp Where title Contains “search” And booktitle Contains “cidr” – Query-by-exmaple (Form) – Keyword Search “search cidr” CIDR'11 - DBease (2)(2) cidr

Comparison of Different Methods CIDR'11 - DBease (3)(3) Usability

Too many results! Keyword Search Is traditional keyword search good enough? CIDR'11 - DBease No result! (4)(4)

Form-based Search Form-based Search has the same problem. CIDR'11 - DBease Complicated and still no result! (5)(5)

Our Solution CIDR'11 - DBease (6)(6) Type-Ahead Search Type-Ahead Search in Forms SQL Suggestion Usability

What is Type-Ahead Search? CIDR'11 - DBease (7)(7)

Type-Ahead Search Advantages – On-the-fly giving users instant feedback – Helping users navigate the underlying data – Tolerating inconsistencies between query and data – Supporting Synonyms – Supporting XML data – Supporting Multiple tables CIDR'11 - DBease (8)(8)

Problem Formulation Data: A set of records Query – Q = {p 1, p 2, …, p l }: a set of prefixes – δ: Edit-distance threshold Result – A set of records having all query prefixes or their similar forms (conjunctive) CIDR'11 - DBease Edit Distance: The number of edit operations (insertion, deletion, substitution) transformed a string to another ed(string, stang) =2 (9)(9)

Indexing Trie Index Words: root to leaves Inverted lists on leaves CIDR'11 - DBease (10)

(11) CIDR'11 - DBease Algorithm Step 1: Find similar prefixes incrementally Step 2: Retrieve the leaf nodes of similar prefixes Step 3: Compute union lists of inverted lists of leaf nodes Step 4: Intersect the union lists of query keywords =cid r

Type-Ahead Search in Forms CIDR'11 - DBease (12) Type-Ahead Search Type-Ahead Search in Forms Usability

What is Type-Ahead Search in Forms? CIDR'11 - DBease (13)

Type-Ahead Search in Forms Problem Formulation – Data: A relation with multiple attributes – Query: A set of prefixes on attributes in a form interface – Answers: Local results of the focused attribute Global results of the relation Advantages – On-the-fly Faceted Search – Supporting Aggregation CIDR'11 - DBease (14)

Data Partition Global Table  Local Tables CIDR'11 - DBease (15) IDTitleConf.Author 1xml databaseVLDBalbert 2xml databaseSIGMODbob 3xml searchVLDBalbert 4xml securityVLDBalice 5rdbmsSIGMODcharlie IDTitle T1xml database T2xml search T3xml security T4rdbms IDConf. C1VLDB C2SIGMOD IDAuthor A1albert A2bob A3alice A4charlie

Indexing Each attribute – Trie – Mapping Tables Local  Global Global  Local CIDR'11 - DBease (16)

Our Solution CIDR'11 - DBease (17)

Author: xml Title: albert alice xml database, albert xml search, albert xml security, alice al Our Solution CIDR'11 - DBease (18) l b e r i c 5: alice 4: albert e T1 Trie 1,2 T23 T34 T45 L-G Mapping Table 1T1 2T1 3T2 4T3 G-L Mapping Table 5T4 a a

SQL Suggestion CIDR'11 - DBease (19) Type-Ahead Search Type-Ahead Search in Forms SQL Suggestion Usability

What is SQL Suggestion? CIDR'11 - DBease (20)

SQL Suggestion Problem Formulation – Data: A database with multiple tables – Query: A set of keywords – Answers: Relevant SQL queries Advantages – Suggest SQL queries based on keywords – Help users formulate SQL queries to find accurate results – Designed for both SQL programmers and Internet users – Group answers based on SQL structures – Support Aggregation – Support Range queries CIDR'11 - DBease (21)

Our Solution Suggest Templates from Keywords – A template is a structure in the databases – Modeled as a graph Nodes: entities (table names or attribute names) Edges: foreign keys or membership Suggest SQL queries from Templates – Mapping between keywords and templates CIDR'11 - DBease (22) keyword paper ir (a) Query (b) Template (c) SQL

Template Suggestion Template Generation – Extension from basic entities (tables) Template Ranking – Template weight Pagerank – Relevancy between a keyword and an entity Tf*idf Algorithms – Fagin algorithms – Threshold-based pruning techniques CIDR'11 - DBease (23)

SQL Suggestion SQL suggestion model – Mapping from keywords to templates – Matching is a set of mappings with all keywords – Weighted set-covering problem (NP-hard) SQL ranking – Relevancy between keywords and attributes – Attribute weight Algorithms – Greedy algorithms CIDR'11 - DBease (24)

Search: dbease Keyword Search: Form-based Search: SQL:

Differences to Google Instant Search Fuzzy prefix matching Google firstly predicts queries, and then use the top queries to search the documents. Google may involve false negatives, while we can find the accurate top-k answers. CIDR'11 - DBease (27)

Differences to Complete Search Fuzzy prefix matching Different index structures More efficient CIDR'11 - DBease (28)

Differences to Keyword Search Effectiveness – SQL Suggestion supports range queries, and aggregation functions. – SQL Suggestion can group answers. – SQL Suggestion can help users to express their query intent more accurately. Efficiency – Faster CIDR'11 - DBease (29)