A Cooperative Database System (CoBase) for Query Relaxation Wesley W. Chu, Hua Yang, and Gladys Chow Presented by David Liu.

Slides:



Advertisements
Similar presentations
Conceptual Clustering
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Fast Algorithms For Hierarchical Range Histogram Constructions
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Multidimensional Indexing
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
Spatial Mining.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Spatial Indexing I Point Access Methods. PAMs Point Access Methods Multidimensional Hashing: Grid File Exponential growth of the directory Hierarchical.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Xyleme A Dynamic Warehouse for XML Data of the Web.
B+-tree and Hashing.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
Spatial Information Systems (SIS) COMP Spatial access methods: Indexing.
Basic Data Mining Techniques
Data Mining Techniques for Query Relaxation. 2 Query Relaxation via Abstraction Abstraction is context dependent: 6’9” guard  big guard 6’9” forward.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Chapter 3 Parallel Search 3.1Search Queries 3.2Data Partitioning 3.3Search Algorithms 3.4Summary 3.5Bibliographical Notes 3.6Exercises.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Different Tree Data Structures for Different Problems
Trees Chapter 15 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Sensor Network Databases1 Overview: Chapter 6  Sensor Network Databases  Sensor networks are conceptually a distributed DB  Store collected data  Indexes.
Mutlidimensional Indices Instructor: Randal Burns Lecture for 29 November 2005 Computer Science Johns Hopkins University.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Indexing for Multidimensional Data An Introduction.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
tch?v=Y6ljFaKRTrI Fireflies.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
CSC 211 Data Structures Lecture 13
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
CS4432: Database Systems II Query Processing- Part 2.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Relaxing Queries Presented by Ashwin Joshi Kapil Patil Sapan Shah.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
CS4432: Database Systems II Query Processing- Part 1 1.
SIMILARITY SEARCH The Metric Space Approach
Module 11: File Structure
Program based on pointers in C.
Chapter 12: Query Processing
K Nearest Neighbor Classification
Database.
Multidimensional Indexes
Searching CLRS, Sections 9.1 – 9.3.
Text Categorization Berlin Chen 2003 Reference:
Chapter 8 Views and Indexes
Presentation transcript:

A Cooperative Database System (CoBase) for Query Relaxation Wesley W. Chu, Hua Yang, and Gladys Chow Presented by David Liu

6/24/2015David Liu, UCB Database Seminar Motivation §Often times when you query, you want ‘about the same’ instead of ‘exactly’ ŸMedical Image Diagnosis—match images to diseases §Other times, you might not even want near items, just the least far ŸARPA/Rome Planning Labs Initiative (ARPI) Transportation problem

6/24/2015David Liu, UCB Database Seminar High Level description of solution §View a query Q’s response set R as a subset of all information stored in the database §All records in R satisfy a set of constraints C put forth by Q §If R is empty, then perform incremental relaxation

6/24/2015David Liu, UCB Database Seminar CoBase §Main design features: ŸRelaxation: if there’s no exact match, try to find a ‘close’ neighbor and see if he matches ŸControl: allow the user to control relaxations ŸExplanation: justify relaxations to the user in semantic terms

6/24/2015David Liu, UCB Database Seminar Architecture Source: A Cooperative Database System for Query Relaxation, page 4

6/24/2015David Liu, UCB Database Seminar Demonstration

6/24/2015David Liu, UCB Database Seminar Relaxation: Type Abstraction Hierarchies §Sample query: SELECT * FROM Students s WHERE s.GPA = §Suppose that there are no students with GPA = 3.700, but some with and another with §We might conceptually have wanted the student table to return these tuples §We can use Type Abstraction Hierarchies (TAHs) to classify GPA’s conceptually

6/24/2015David Liu, UCB Database Seminar Relaxation: Type Abstraction Hierarchy(TAH)

6/24/2015David Liu, UCB Database Seminar TAH Operators §There are two special operators used to exploit the TAH: ŸGeneralize(node x)—get the parent of x, which which encapsulates instances which are similar to x ŸSpecialize(node x)—get the set of all instances represented by node x. Definition: ŸNote: these two operators not inverses

6/24/2015David Liu, UCB Database Seminar TAH Operators §A relaxation can be seen as: ŸSpecialize(Generalize(x)): where x is the value/predicate that we are trying to relax §An n-level relaxation is then: ŸSpecialize(Generalize n (x)): which is the same as n iterative generalizations followed by a specialization

6/24/2015David Liu, UCB Database Seminar Relaxation Example § Example: subtree of the GPA TAH: ŸGeneralize(3.700) will yield node A ŸSpecialize(Generalize(3.700)) will yield the set of values: {3.667,…,4.000} ŸSpecialize(Generalize 2 (3.700)) will yield the following set: Ÿ{3.352,…,3.700,…,4.000}

6/24/2015David Liu, UCB Database Seminar Multi-attribute Type Abstraction Hierarchy (MTAH) §MTAH’s are multiple-attribute type abstraction hierarchies §These are a generalization of single- attribute TAH’s §MTAH’s can be used to classify geographical data

6/24/2015David Liu, UCB Database Seminar MTAHs: Example Based on: A Cooperative Database System for Query Relaxation, page 6 Bizerte Tunis Saminjah Sfax Gabes Jerba Gafsa El_Borma Djedeida

6/24/2015David Liu, UCB Database Seminar Automatic Generation of TAH’s §Main idea: Ÿrecursively partition search space into two until each partition has less than T items ŸRepartition each partition further to obtain N-ary partition. This is done with a hill climbing algorithm

6/24/2015David Liu, UCB Database Seminar Automatic Generation of TAH’s §Main idea: ŸBinary partitioning: recursively partition search space into two until each partition has less than T items ŸN-ary partitioning: Repartition each partition further to obtain N-ary partition. This is done with a hill climbing algorithm

6/24/2015David Liu, UCB Database Seminar Automatic Generation of TAH’s §After each partition, calculate the Categorical Utility of the partitioning to decide whether to terminate §Relaxation Errors to measure utility

6/24/2015David Liu, UCB Database Seminar Generation of TAH’s complexity §In general, partitioning is exponential: O(N N ) where N is the number of items §Partitioning a sorted set into contiguous clusters allows O(n 2 ) worst-case performance and O(n log n) average performance

6/24/2015David Liu, UCB Database Seminar CoSQL §Extension to SQL to add relaxation operators ŸContext Free ŸContext Sensitive ŸControl ŸInteractive

6/24/2015David Liu, UCB Database Seminar CoSQL: Context Free §Approximate Ÿ^v 1 ŸReturn values approximate to v 1 §Between two members Ÿbetween(v 1,v 2 ) ŸReturn values between two values §Within a set ŸWithin(v 1,v 2,…,v n ) ŸSpecifies set membership

6/24/2015David Liu, UCB Database Seminar CoSQL: Context Sensitive §Context sensitive nearness ŸNear-to X §User-specified nearness ŸSimilar to X based-on ((a 1 w 1 ) (a 2 w 2 )…(a n w n ) Ÿ a i are attributes and w i are weights

6/24/2015David Liu, UCB Database Seminar CoSQL: Control Operators §Prioritization of relaxation ŸRelaxation-order(a 1,a 2,…,a n ) §Relaxation restriction ŸNot-relaxable(a 1,a 2,…,a n ) §Preference-list ŸPreference-list(v 1,v 2,…,v n ) on a particular attribute a §Unacceptable values ŸUnacceptable-list(v 1,v 2,…,v n ) on a particular attribute a

6/24/2015David Liu, UCB Database Seminar CoSQL: Control Operators cont’d §Using another TAH ŸAlternative-TAH(TAH-Name) §Restricting amount of relaxation ŸRelaxation-level(v) §Answer-set(s) ŸSpecifies the minimum set of answers

6/24/2015David Liu, UCB Database Seminar CoSQL: Interactive operators §Nearer, further ŸThese Interactive operators are invoked after the user see’s an answer-set Ÿnot SQL per se ŸUsed to interactively control geographical queries

6/24/2015David Liu, UCB Database Seminar Explanation Mediators §By having automated relaxation, the user loses understanding of the system §Explanation mediator explains relaxations and justifies them to the user §Explanations come from an explanation dictionary

6/24/2015David Liu, UCB Database Seminar Performance §Queries from the ARPI transportation domain had the following results: ŸQuery relaxation time 1/5 (2 secs) of database retrieval time ŸDatabase retrieval time (10 secs) ŸExplanation time also another 1/5 (2 secs) of database retrieval time ŸTotal overhead is about 40% ŸMost important measure: relaxation quality, is difficult to measure ŸUnclear: exact running times of TAH generation and storage spaces for these TAH’s

6/24/2015David Liu, UCB Database Seminar TAH’s and B-trees? §TAH’s are much like B-tree indexes: ŸHierarchical ŸCluster-based ŸPartition search space ŸTAH:B-tree::MTAH:R-tree sWith the exception that R-trees allow overlapping partitions ŸTAH like iterative access method that traverses up and down the tree

6/24/2015David Liu, UCB Database Seminar Applications §Medical Image matching §ARPI Transportation Planning §Electronic Warfare

6/24/2015David Liu, UCB Database Seminar Evaluation §Mutually exclusive partitioning could be a problem ŸOptimal arrangement for this CoBase’s relaxation approach is to radiate outward from the querying ‘epicenter’ §Multiple dimension exacerbates the partitioning problem §Indexing techniques might be beneficial to allow overlapping partitions

6/24/2015David Liu, UCB Database Seminar The End

6/24/2015David Liu, UCB Database Seminar Categorical Utility(CU) §Categorical Utility is the objective value of a partition §RE of a point: ŸX i is a point, P(x j )=probability of point x j

6/24/2015David Liu, UCB Database Seminar Categorical Utility(CU) §Categorical Utility is the objective value of a partition §RE of a partition: ŸC is a partition, x i ’s are the points in the partition, P(x i ) is the probability of occurrence of each point, RE(x i ) is the relaxation error of the point in the partition

6/24/2015David Liu, UCB Database Seminar Categorical Utility(CU) §Categorical Utility is the objective value of a partition §RE of a partition: ŸP is a partitioning, P(C k ) is the probability of occurrence of each partition, RE(C k ) is the relaxation error of the partition