Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.
Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Evaluating Search Engine
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Mutual Information Mathematical Biology Seminar
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
INTRODUCTION TO DATABASE USING MS ACCESS 2013 PART 2 NOVEMBER 4, 2014.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Introduction to Accounting Information Systems
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Querying Structured Text in an XML Database By Xuemei Luo.
Structured Querying of Web Text A Technical Challenge Kulsawasd Jitkajornwanich University of Texas at Arlington CSE6339 Web Mining.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
Summarization of XML Documents K Sarath Kumar. Outline I.Motivation II.System for XML Summarization III.Ranking Model and Summary Generation IV.Example.
Construction of Substitution Matrices
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
SINGULAR VALUE DECOMPOSITION (SVD)
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Web- and Multimedia-based Information Systems Lecture 2.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Construction of Substitution matrices
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
An Efficient Algorithm for Incremental Update of Concept space
Mikhail Bilenko, Sugato Basu, Raymond J. Mooney
Feature Selection for Ranking
Projects….
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina

Introduction The database selection problem for relational data sources  Demand for keyword-based searches of structured databases  What about searches over multiple structured databases?

Current Approach Document collections are summarized with keyword frequency lists Inadequate because…  Relational tables are typically normalized  Results must consider number of join operations Example: query Q = {multimedia, database, VLDB}

The Usefulness of a Database in Answering a Keyword Query A database is useful if it has high quality results to the keyword query. What are “high quality” results?  The results contain all the query keywords.  There is a keyword relationship – the query keywords can be connected meaningfully in the database.

This paper proposes… A method that summaries the relationships between keywords in a database. A ranking method based on the keyword summaries in order to select the most useful databases for a given keyword query.

Why Summaries? We have an equation which measures the usefulness of a database DB to query Q But calculating this score isn’t feasible for every database in the system! So we estimate the usefulness of the database using a summary of the relationships between all pairs of keywords in the database.

Results of a Keyword Query A minimal tree of tuples containing all the query keywords * tuples in the tree are connected according to their relationships defined in the database schema The more distant the relationships, the weaker the relationships connecting the tuples. distance - the number of join operations in a joining sequence of tuples Examples: t 1 t 2 t 3, distance = 2 t 4, distance = 0 DB2 of Figure 1, Q = {multimedia, binder}  t 1 t 5 t 12 or t 4 t 9 t 12, distance = 2  t 15 t 10 t 1 t 5 t 12, distance = 4

The Matrices Each relational database DB has several types of matrices  one D matrix – represents the presence or absence of each keyword in each tuple in DB  multiple T matrices – represents the relationship between tuples in a relational database at each distance  construct the W d matrices – the frequencies of keyword pairs at distance d; Example: Figure 3

Calculate the Key Relationship Matrix (KRM) w d (k i, k j ) is the frequency of d-distance joining sequences to connect the pair of keywords k i and k j is the maximum number of join operations (user-specified) Example: Figure 4 Q = {database:VLDB} for DB2 in Figure 3, the only non-zero is when d = 2 1/(2+1) * 1 occurrence = 0.33

Implementation in SQL The generation of the matrices: D, T 1, T 2, … T and W 0, W 1, …, W, for each DB, can be performed conveniently inside the local RDBMS using SQL. Tables are created for each matrix. Tables can be maintained dynamically when tuples are added to the database or old tuples are deleted.

Data Sources Selection Using KRM Estimate the score of the data source DB to Q through the scores of the individual pairs. With the KR-summary, we can effectively rank a set of databases. A higher score of the data source DB to Q indicates a better ranking.

Definitions & Metrics for Comparing Experimental Results real ranking – the “real” score of each database based on Equation 2-1 recall – compares the accumulated score of the top l databases selected based on the summaries of the source databases against the total available score when we select top l databases according to real ranking (summaries vs. real ranking) precision – measures the fraction of the top l selected databases which have high quality results

Experimental Results Observations: The selection performance of KR-summaries generally gets better as gets larger. The precision and recall performance for different values of tends to cluster into groups. There are big gaps in both precisions and recalls between KR-summaries when 0 ≤ ≤ 1 and when is greater.

Conclusion “The estimation method is effective in assigning high ranks to useful databases, although less relevant or irrelevant databases might be selected.”