Supporting of search-as-you-type using sql in databases

Slides:



Advertisements
Similar presentations
Sanjay Agrawal Microsoft Research Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research DBXplorer: A System for Keyword Based Search over.
Advertisements

Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance [1] Pirooz Chubak May 22, 2008.
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Top-k Set Similarity Joins Chuan Xiao, Wei Wang, Xuemin Lin and Haichuan Shang University of New South Wales and NICTA.
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach Guoliang Li 1, Shengyue Ji 2, Chen Li 2, Jianhua Feng 1 1 Tsinghua University, Beijing,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Introduction to Structured Query Language (SQL)
Liang Jin * UC Irvine Nick Koudas University of Toronto Chen Li * UC Irvine Anthony K.H. Tung National University of Singapore VLDB’2005 * Liang Jin and.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Introduction to Structured Query Language (SQL)
©Silberschatz, Korth and Sudarshan3.1Database System Concepts - 6 th Edition SQL Schema Changes and table updates instructor teaches.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
Experiments An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints Entity Extraction A Document An Efficient Filter.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Presented By Amarjit Datta
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
In this session, you will learn to: Create and manage views Implement a full-text search Implement batches Objectives.
EFFICIENT ALGORITHMS FOR APPROXIMATE MEMBER EXTRACTION By Swapnil Kharche and Pavan Basheerabad.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Database and Cloud Security
ASP.NET Programming with C# and SQL Server First Edition
Information Retrieval in Practice
Database System Architecture and Implementation
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Information Retrieval in Practice
Database Systems: Design, Implementation, and Management Tenth Edition
Database application MySQL Database and PhpMyAdmin
Database Performance Tuning and Query Optimization
Jiannan Wang (Tsinghua, China) Guoliang Li (Tsinghua, China)
What is a Database and Why Use One?
Query Languages.
Chapter 8 Working with Databases and MySQL
Lecture 12 Lecture 12: Indexing.
Unit# 6: ICT Applications
Weighted Exact Set Similarity Join
A Guide to SQL, Eighth Edition
Instructor 彭智勇 武汉大学软件工程国家重点实验室 电话:
Introduction To Structured Query Language (SQL)
Contents Preface I Introduction Lesson Objectives I-2
Spreadsheets, Modelling & Databases
Tutorial 6 PHP & MySQL Li Xu
Chapter 11 Database Performance Tuning and Query Optimization
Presented by : SaiVenkatanikhil Nimmagadda
ICOM 5016 – Introduction to Database Systems
Information Retrieval and Web Design
Presentation transcript:

Supporting of search-as-you-type using sql in databases Guoliang Li, Chen Li

introduction A search as you type system computes answer on the fly as user types in keyword query character by character Finds the solution for both multi-keyword queries and single keyword queries Finds how to support search as you type in relational database using SQL Example:is Netflix

contents Introduction Preliminaries Problem formulation 3 contents Introduction Preliminaries Problem formulation Exact search for single keyword Fuzzy search for single keyword Supporting multi keyword queries Supporting first N queries Supporting updates efficiently Related work conclusion

preliminaries Let T be the relational table A1,A2,…….,Al are the attributes R={r1,r2…..,rn} is collection of records ri[A j]-denotes content of the record ri in the attribute Aj W is the set of tokenized keywords in R

r1 Relational table(T) r2 r3 r4 r5 r6 r7 ID Title Authors Book title Year r1 K-Auto morphism: A General Framework for Privacy Preserving Network Publication Lei Zou, Lei Chen, M. Tamer O¨ zsu PVLDB 2009 r2 Privacy-Preserving Singular Value Decomposition Shuguo Han, Wee Keong Ng, Philip S. Yu ICDE r3 Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri SIGMOD r4 Privacy-preserving Indexing of Documents on the Network Mayank Bawa, Roberto J. Bayardo, Rakesh Agrawal, Jaideep Vaidya VLDBJ r5 On Anti-Corruption Privacy Preserving Publication Yufei Tao, Xiaokui Xiao, Jiexing Li, Donghui Zhang 2008 r6 Preservation of Proximity Privacy in Publishing Numerical Sensitive Data Jiexing Li, Yufei Tao, Xiaokui Xiao r7 Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking Feifei Li, Jimeng Sun, Spiros Papadimitriou, George A. Mihaila, Ioana Stanoi SIGIR 2007

Search as you type for single keyword queries 6 Search as you type for single keyword queries Exact search: as user types a single partial key word w character by character,search as you type system finds the record that contain keyword with a prefix w It is also known as prefix search eg: if user types in a query “sig”, the system returns Returns records r3,r6 and r7

search as you type for single keyword queries Fuzzy search: As user types in a single partial keyword w character by character ,system finds records with keywords similar to the query keyword Eg:when we types in query “corel” ,record r7 is relevant answer since it contain keyword”correlation” Edit distance measure is used for finding similarity between strings ed(s1,s2)

Search as you type for multi-keyword queries Exact search: given a multi-keyword query Q with m keywords w1,w2,…wm Wm is considered as partial keyword and others are complete keyword Fuzzy search:finds the record that contains keyword similar to the complete keyword and a keyword with a prefix similar to the partial keyword w Wm is the partial keyword Using the edit distance

Different approaches for search as you type 1.Use separate application layer It can achieve high performance 2.Use database extenders not safe method to query engine Depends on API of specific database 3.Use SQL More compatible since it is using standard SQL More portable to different platforms than other method

Exact search for single keyword No index methods Index based methods Issues a SQL query that scans each records and verifies whether the record is an answer to the query 1.Calling user defined functions (UDF) Add functions to the database 2.Using LIKE predicate LIKE predicate allow users to perform string matching LIKE predicate is used to check whether a record contain the query keyword

Index based method Building auxiliary tables as index structures Inverted index table Prefix table Inverted- index table Inverted Index table (IT)- records are in the form<kid,rid> Assigning unique id to the keywords in Table(T) Kid-id of the keyword Rid-is the id of the record

Prefix table Build prefix table with records in the form<p,lkid,ukid> P-prefix of the keyword lkid-smallest id of the keyword with prefix p Ukid-largest id of the keyword with prefix p Prefix table can be used to find the range of keyword with the prefix Eg:the ids of the keyword with prefix “sig” must be In the range [k6,k7] <“sig”,k6,k7>

Steps for exact search For given partial keyword w Get the keyword range [lkid,ukid] using prefix table Find the records that have a keyword in the range through the inverted index table IT Then use the following SQL to answer the prefix- search query w SELECT T .* FROM PT , IT , T WHERE PT .prefix = “w” AND PT .ukid ≥ IT .kid AND PT .lkid ≤ IT .kid AND IT .rid = T.rid.

The inverted-index table and prefix table. lkid ukid ic k1 k2 p k3 k6 pr k4 pri K4 pu k5 pv K6 pvl kid keyword k1 icde K2 icdt K3 preserving K4 privacy K5 publishing K6 sigmoid k7 sigir kid rid K2 r10 K5 r6 r8 k6 r1 K7 r9 k8 r3

fuzzy search for single keyword No index methods Using UDF(user defined function) for supporting prefix search LIKE predicate does not support fuzzy search PED (w,s)-returns minimal edit distance between w and the prefix of the keyword in s Eg:PED(“pvb”,r10[title])=PED(“pvb”, “privacy in database publishing”)=1 Uses the edit distance threshold and forms new UDF PEDTH(w,s,t) if edit distance is within t then UDF returns true

Index based method Using the inverted index table and prefix table to support search as you type Find the similar prefixes from the prefix table(PT) Finds the keyword ranges of similar prefixes Finds the answer based on the keyword ranges using inverted index table Method to finding similar prefixes Using UDF Gram-based method Neighborhood generation method

Using UDF Using a UDF to find similar prefixes Issues an SQL query which scans each prefix in PT and calls the UDF to check if the prefix is similar to w Gram based method q-gram based methods are used for string matching For a string “s” its q-grams are substrings with length q

Neighborhood generation method i-deletion neighborhood- for a given keyword w, the substring of w by deleting ”i” characters is called i-deletion neighborhood Di(w) denotes i-deletion neighborhood For eg: for a keyword “pvldb” D0(pvldb)={pvldb} D1(pvldb)={vldb,pldb,pvdb,pvld,pvld}

Supporting multi keyword queries Computing answer from scratch Q –multi keyword query with m keyword w1,w2,..wm Using an “INTERSECT” operator first find the record for each keyword use the “INTERSECT”operator to join the records 2. Using full text indexes first find the records matching the first m-1 complete keyword Find the record matching last prefix keyword Join the result Use of CONTAINS command

Word level incremental computation User types query Q with keyword w1,w2..wm Temporary table CQ is created to cache the record ids of the query Q if user types new keyword and submit a new query Q1 ,temporary table is used to incrementally answer the query two types of word level incremental computation Exact search Fuzzy search

Supporting first n queries As user types in a query character by character system gives the first-N results as instant feedback Exact first-N queries “LIMIT “Syntax is used which returns first N results from the database E.g.:”LIMIT n1,n2” which returns n2 rows starting from n1th row Fuzzy first-N queries Fuzzy search for single keyword query w, find the result with edit distance=0 If there is N answer, terminate the execution otherwise progressively increase the edit distance threshold and select the record with edit distance threshold 1,2,..t until get the N answer

Supporting updates efficiently Data updates includes deleting and insertion of records Insertion When a record is inserted then first assign it a new record id Insert each keyword in the inverted index table If prefix is not in the prefix table add prefixes to the prefix table Deletion If a record is deleted inverted index table use a bit to denote whether the record is deleted We can use the deleted prefix ids for future insertions

Related work 1.Auto completion and search as you type An auto completion system can predict a word or phrase that a user may type in next based on partial string the user has already typed 2.Approximate string search and similarity join In this method, Given a set of String and query string,all string in the set are similar to the query string 3.Keyword search in database

conclusion Need to leverage the existing DBMS functionalities to meet the high performance requirement achieve an interactive speed To support prefix matching,there is proposed solutions that use auxiliary tables as index structures Extended the techniques in the case of fuzzy queries Proposed incremental computation techniques to answer multi-keyword queries Studied how support first N queries

references [1] S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V. Ganti. Scalable ad-hoc entity extraction from text collections. PVLDB, 1(1):945–957, 2008. [2] S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5–16,2002.

thank you