Harikrishnan Karunakaran Sulabha Balan CSE 6339.  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.

Single Source Shortest Paths

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Technical BI Project Lifecycle

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Evaluating Search Engine

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Chapter 19: Information Retrieval

B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

The Euler-tour technique

Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.

Bidirectional Expansion for Keyword Search on Graph Databases Varun Kacholia Shashank Pandit Soumen Chakrabarti S. Sudarshan.

Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,

Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.

GRIN – A Graph Based RDF Index Octavian Udrea Andrea Pugliese V. S. Subrahmanian Presented by Tulika Thakur.

Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University

XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.

ASP.NET Programming with C# and SQL Server First Edition

Network Aware Resource Allocation in Distributed Clouds.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.

Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Database Management 9. course. Execution of queries.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.

Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.

CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang

Querying Structured Text in an XML Database By Xuemei Luo.

Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.

Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.

Session 1 Module 1: Introduction to Data Integrity

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Dynamic Faceted Search for Discovery- driven Analysis Debabrata Sash, Jun Rao, Nimrod Megiddo, Anastasia Ailamaki, Guy Lohman CIKM’08 Speaker: Li, Huei-Jyun.

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

1 CS 430: Information Discovery Lecture 5 Ranking.

MapReduce and the New Software Stack. Outline  Algorithm Using MapReduce  Matrix-Vector Multiplication  Matrix-Vector Multiplication by MapReduce 

Keyword Searching and Browsing in Databases using BANKS Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan Presented by Sushanth.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)

XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS

Keyword Searching and Browsing in Databases using BANKS

Information Retrieval

Data Integration for Relational Web

Keyword Searching and Browsing in Databases using BANKS

Keyword Searching and Browsing in Databases using BANKS

Bidirectional Query Planning Algorithm

Presentation transcript:

Harikrishnan Karunakaran Sulabha Balan CSE 6339

 Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model  Searching for the Best Answers ◦ Backward Expanding Search Algorithm  Browsing through BANKS  Experience and Performance  Related Work  Conclusion

 With the onset of the web, number of users needing to access online databases have increased  Search engines have popularized use of unstructured querying which just needs the user to type in the keyword and follow links  Same methodology cannot be used in querying databases as knowledge of schema and querying language like SQL is needed  Keyword searching will not also work on datatbases because the data is usually spread across tables/tuples due to normalization

 Browsing ANd Keyword Searching  Enables Keyword-Based search on Relational Databases along with Data and Schema Browsing  User interacts with data through typing keywords, following hyperlinks and using controls made available.  Absolutely no knowledge of querying or programming languages required of the user

 Makes Joins Implicit and Transparent  Incorporates Notions of Proximity and Prestige  Methods to publish relational data which would otherwise remain invisible on the web are provided  Creates hierarchical and graphical views of data with hyperlinks to navigate through them

 Answer to query should be sub-graph connecting nodes matching the keywords  Central node that connects all keyword nodes is Information Node and tree is Connection Tree. Foreign Key Dependencies constitute edges in the graph  Edges (References) are given weights according to its type  Weight of the tree is proportional to total of its edge weights and relevance is inversely proportional to its weight

 To obtain model with edges directed away from information node and preserve directionality we make use of backward edges  Backward Edge assigned weight proportional to indegree  Information Node is selected from certain sets of nodes in the graph  Backward Edges ensure that tree is rooted at the Information Node  To avoid problem of “hubs”, edges connecting popular nodes are given a higher weight thus lowering proximity

 Concept of Node weights introduced to include Prestige Rankings  Nodes having more pointers given higher prestige  In BANKS node prestige is assigned based on in-degree of the node  Node weights and Tree Weights combined to obtain relevance score

 Each tuple Τ has a corresponding node u τ  Each node u has a node weight N(u) depending upon the prestige of the node  Between each pair of related tuples T 1 & T 2, graph contains edge between u τ 1 to u τ 2 and back edge from u τ 2 to u τ 1  Similarity between two relations R 1 and R 2 depends upon the type of link from R 1 to R 2 and is set to infinity if R 1 does not refer to R 2

Edge weights  Depending upon importance of the link we set a value to the edge. Default value is 1  The weight of the directed edge(u, v) depends on factors:  If (u, v) exists but (v, u) does not, assign the weight s(R(u),R(v)) to (u, v)  If (u, v) does not exist and (v, u) does, assign the weight INv (u) s(R(v), R(u)) to (u, v) where INv is the indegree of u contributed by the tuples belonging to relation R(v)  If both (u, v) and (v, u) exist in the graph, assign the weight as the minimum of two values min{s(R(u),R(v)), INv (u) s(R(v), R(u))

 Query  A set of keywords e.g.{k 1,k 2,…k n }  A set of nodes S i = {S 1,S 2,…S n }  Locate nodes matching search terms t 1,t 2,…t n  Answer Model  A rooted directed tree connecting keyword nodes (at least one node from S i ).  Note: Tree may also contain nodes not in any S i, Steiner Tree  Relevance score of an answer tree  Combination of its nodes and its edge weight presented in decreasing order

 Calculating Relevance Score involves adjustment of both node weights and edge weights along with a factor to control individual weight variations  Node weights  Scaled to N max and depressed using log  Nscore(v) = N(v)/ N max or log(1+N(v)/N max  Overall Nscore taken to be average of node scores  Edge Weights  Normalized Escore(e) obtained by diving edge weight by minimum edge weight  Escore(e) = log(1+w(e)/w min )  Overall Edge Score = 1/(1 + Σ e Escore(e))  Combination of Overall Edge Score and Node Score  Additive : (1-λ)Escore + λNscore  Multiplicative : Escore * Nscore λ

 We have to use not just the tree with the highest relevance score but also those with high scores  Answers have to be generated incrementally so that the user are provided with the ‘best’ answers at the beginning  Resultant Graph is assumed to fit in memory since only Row IDs and index to map RowIDs to nodes in the graph need to be stored by us.

 Incrementally computes search results  Start at leaf nodes each containing a query keyword  Run concurrent single source shortest path algorithm from each such node  Traverses the graph edges backwards  Confluence of backward paths identify answer tree roots  Output a node whenever it is on the intersection of the sets of nodes reached from each keyword  Answer trees may not be generated in relevance order  Insert answers to a small buffer (heap)  Output highest ranked answer from buffer to user when buffer is full

 Model (Query : Roy Sudarshan)

 Due to the graphs being Steiner Trees lot of time is spent doing wasteful exploration of the graph  As keyword nodes increase, the feasibility of the algorithm decreases  Connection Trees are only approximately sorted in their increasing order of weights  Node weights are not considered, hence trees may not be produced in exact decreasing order of relevance

 BANKS system provides  A rich interface to browse data stored in a relational database  Automatically generates browsable views of database relations and query results  Schema browsing and data browsing  A hyperlink to the referenced tuple

 Functionalities  Columns can be projected away  Selections can be imposed on columns  Joins can be performed with foreign key columns by joining them with referencing tables  Results can be grouped by on columns which returns only distinct values in column being displayed  Sorting can be done on columns

 Cross Tabs  Group By Template to view Data hierarchially  Folder Views modeled after the folder view supported by Windows Explorer etc.  In the form of bar chart, line chart or pie chart with HTML image maps to embed hyperlinks in the graphics

 Datasets of varying sizes have been tested  No agreed upon benchmarks for Ranking Algorithms in this domain  System was found to return the most intuitive answers

 Ideal answers were obtained using different queries  Compute absolute value of rank difference of the ideal answers with rank in the answers for given parameter setting  Sum of rank differences gives the raw error score for that parameter  We map error scores against λ and log- scaling of edge weights

 Setting λ = 0.2 produced best results while λ = 1 produced worst with error scores of around 15  Log scaling of edge weight is important as otherwise back-edges from popular nodes would result in correct answers getting low relevance scores  Additive or Multiplicative combination has no effect on ranking  Node weights were abandoned as log-scaling and no log scaling produced same ranking

 Effective when using queries matching non- metadata keywords  Brings to light data that might not be readily available on the web to the non technical user  Higher the no. of keywords, the less useful backward-expanding search algorithm becomes  Over reliance on Java can at times cause slow down of application