Keyword Searching and Browsing in Databases using BANKS

Slides:



Advertisements
Similar presentations
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Advertisements

Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Keyword Searching in Relational Databases
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Information Retrieval in Practice
Search Engines and Information Retrieval
Intranet Mediator Clement Yu Department of Computer Science University of Illinois at Chicago.
New Library Catalogue Interface Proposal 3. Introduction This presentation will outline the design decisions for the new interface of the on-line library.
Chapter 19: Information Retrieval
Overview of Search Engines
Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of.
Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Search Engines and Information Retrieval Chapter 1.
1 Chapter 19: Information Retrieval Chapter 19: Information Retrieval Relevance Ranking Using Terms Relevance Using Hyperlinks Synonyms., Homonyms,
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Querying Structured Text in an XML Database By Xuemei Luo.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 6: Information Retrieval and Web Search
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Keyword Search on Graph-Structured Data
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
1 CS 430: Information Discovery Lecture 5 Ranking.
Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
General Architecture of Retrieval Systems 1Adrienn Skrop.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Information Retrieval in Practice
Databases (CS507) CHAPTER 2.
Chapter 2: Database System Concepts and Architecture - Outline
Search Engine Architecture
Database System Concepts and Architecture
Information Retrieval
Methods and Apparatus for Ranking Web Page Search Results
Extra: B+ Trees CS1: Java Programming Colorado State University
Introduction to Query Optimization
Associative Query Answering via Query Feature Similarity
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Keyword Searching and Browsing in Databases using BANKS
Database management concepts
Information Retrieval
Declarative Creation of Enterprise Applications
Dynamic SQL: Writing Efficient Queries on the Fly
Indexing and Hashing Basic Concepts Ordered Indices
Selected Topics: External Sorting, Join Algorithms, …
Keyword Searching and Browsing in Databases using BANKS
Lecture 2- Query Processing (continued)
Manuscript Transcription Assistant Initiative
Database management concepts
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Introduction to Information Retrieval
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
Chapter 31: Information Retrieval
Information Retrieval and Web Design
Chapter 19: Information Retrieval
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Keyword Searching and Browsing in Databases using BANKS 2/22/2019 Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe Joint work with: Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan I.I.T. Bombay 2/22/2019

Motivation Keyword search of documents on the Web has been enormously successful Simple and intuitive, no need to learn any query language Database querying using keywords is desirable SQL is not appropriate for casual users Form interfaces cumbersome: Require separate form for each type of query — confusing for casual users of Web information systems Not suitable for ad hoc queries 2/22/2019

Motivation Many Web documents are dynamically generated from databases 2/22/2019 Motivation Many Web documents are dynamically generated from databases E.g. Catalog data Keyword querying of generated Web documents May miss answers that need to combine information on different pages Suffers from duplication overheads Changed the 2nd bullet 2/22/2019

Examples of Keyword Queries On a railway reservation database “mumbai bangalore” On a university database “database course” On an e-store database “camcorder panasonic” On a book store database “sudarshan databases” 2/22/2019

Differences from IR/Web Search Related data split across multiple tuples due to normalization E.g. Paper (paper-id, title, journal), Author (author-id, name) Writes (author-id, paper-id, position) Different keywords may match tuples from different relations What joins are to be computed can only be decided on the fly Cites(citing-paper-id, cited-paper-id) 2/22/2019

Connectivity Tuples may be connected by Foreign key and object references Inclusion dependencies and join conditions Implicit links (shared words), etc. Would like to find sets of (closely) connected tuples that match all given keywords 2/22/2019

Basic Model Database: modeled as a graph Nodes = tuples Edges = references between tuples foreign key, inclusion dependencies, .. Edges are directed. BANKS: Keyword search… MultiQuery Optimization paper writes Charuta S. Sudarshan Prasan Roy author 2/22/2019

Answer Example Query: sudarshan roy paper MultiQuery Optimization writes writes author author S. Sudarshan Prasan Roy 2/22/2019

The BANKS Answer Model Query: set of keywords {k1, k2, .., kn} Each keyword ki matches set of nodes Si Answer: rooted, directed tree connecting nodes, with one node from each Si Root node has special significance, may be restricted to some relations E.g. relations representing entities, not relationships May include intermediate nodes not in any Si and hence a steiner tree. Multiple answers Ranking based on proximity + prestige 2/22/2019

Edge Directionality Some popular tuples are connected to many other tuples E.g. Students -> departments -> university Popular tuples would create misleading shortcuts from every tuple to every other E.g. every student would be closely linked with every other student via the department/university Solution: define different forward and backward edge weights Forward edges: In the direction of the foreign key reference 2/22/2019

Edge Weight Weight of forward edge based on schema e.g. citation link weights > “writes” link weights Weight of backward edge = indegree of edges pointing to the node 3 1 1 1 2/22/2019

Edge Weight Scaling Problem: Some backward edges have unduly large weights Scale edge weights by using log(1+raw-edgeweight) total-edge-weight =  edge-weights Edge score E = 1 / total-edge-weight 2/22/2019

Node Weight Nodes have prestige weights too Set node weight = indegree Observation: nodes with intuitively greater prestige tend to have greater indegree Set node weight = indegree Problem: Nodes with many in-edges result in skewed answers Subdue extreme node weights by using log(1+indegree) Node score N = root-node-weight +  leaf-node-weights 2/22/2019

Combining Scores Problem: how to combine two independent metrics: node weight and edge weight Normalize each to 0-1 Combine using weighting factor  Additive: (1- ) E +  N Multiplicative: E N Performance study to compare alternatives and to find reasonable values for  2/22/2019

Finding Answer Trees Backward Expanding Search Algorithm: Intuition: find vertices from which a forward path exists to at least one node from each Si. Run concurrent single source shortest path algorithm from each node matching a keyword Create an iterator for each node matching a keyword Traverse the graph edges in reverse direction Output a node whenever it is on the intersection of the sets of nodes reached from each keyword 2/22/2019

Backward Expanding Search Query: sudarshan roy MultiQuery Optimization paper writes S. Sudarshan Prasan Roy authors 2/22/2019

Result Ordering Answer trees may not be generated in relevance order Solution: Best-first search across all iterators, based on path length Output answers to a buffer Output highest ranked answer from buffer to user when buffer is full 2/22/2019

2/22/2019 The BANKS System BANKS provides keyword search coupled with extensive browsing facilities Schema browsing + data browsing Graphical display of data Implemented using Java + servlets Keyword search response times typically 1 to 3 seconds on DBLP database with 100,000 tuples/300,000 edges P3 600 MHz, 512 MB RAM Try it out at www.cse.iitb.ac.in/banks/ New slide, with stuff on browsing, and one more on browsing next 2/22/2019

Example of Browsing in BANKS 2/22/2019

Anecdotes “Mohan” “Transaction” “Sunita Seltzer” Returns C. Mohan at top based on prestige (number of papers written) “Transaction” Returns Jim Gray’s classic paper and textbook as top answers based on prestige (number of citations) “Sunita Seltzer” No common papers, but both have papers with Stonebraker: system finds this connection 2/22/2019

Effect of Parameters Log scaling of edge weights worked well (1- ) E +  N versus E N -- made little difference Best with  = .2 (subdue node weights but not entirely) EdgeLog 2/22/2019

Related Work DataSpot (DTL)/Mercado Intuifind [VLDB 98] 2/22/2019 Related Work DataSpot (DTL)/Mercado Intuifind [VLDB 98] Based on patent by Palmon (filed 1995, granted 1998) Based on hypergraph model, similar answer model to ours Differences: our model of backward link weights and prestige Proximity Search [VLDB98] Different model of proximity based on adding up support No edge weights, prestige, different evaluation algorithm Information units (linked Web pages) [WWW10] No directionality, only studied in Web context Microsoft DBExplorer (this conference) No ranking, based on SQL generation Addresses efficient construction of text indexes Microsoft English query Changed DataSpot bullets added English Query and verify claims on DBExplorer with Surajit 2/22/2019

Conclusions and Future Work 2/22/2019 Conclusions and Future Work The next big wave: keyword searching and browsing of databases? Future work: Keyword queries on XML Disambiguating queries by selecting Nodes: G.W.Bush: “Bush Jr” or “Bush Sr” Tree structure: “coauthors” or “cites” Boolean queries, stemming, thesaurus Metadata: column/relation names NOTE!!!: Changed first bullet to something cheeky. You can ask viewers to decide for themselves if its true Changed future work description significantly with new examples 2/22/2019

Thank You 2/22/2019

BANKS Query Result Example Result of “Soumen Sunita” 2/22/2019

2/22/2019

Browsing Features Hyperlinks are automatically added to all displayed results Template facilities to do a variety of tasks Browsing data by grouping and creating crosstabs e.g., theses grouped by department and year Hierarchical views of data Nested XML style, even on relational data Graphical displays Bar charts, pie charts, etc Templates are generic and can be applied on any data matching assumed schema Can be applied after applying selections New templates can be created by user, interactively 2/22/2019

Combining Keyword Search and Browsing Catalog searching applications Keywords may restrict answers to a small set, then user needs to browse answers If there are multiple answers, hierarchical browsing required on the answers 2/22/2019

The BANKS System Available on the web, with (part of) DBLP data http://www.cse.iitb.ac.in/banks Connects to any database using JDBC JDBC metadata features used to provide schema browsing No programming needed for customization Minimal preprocessing of database to create indices and give weights to links Extensive set of browsing features User HTTP BANKS JDBC Web Server + Servlets Database 2/22/2019