Multi-object Similarity Query Evaluation Michal Batko.

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

1 Spatial Join. 2 Papers to Present “Efficient Processing of Spatial Joins using R-trees”, T. Brinkhoff, H-P Kriegel and B. Seeger, Proc. SIGMOD, 1993.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
PARTITIONAL CLUSTERING
The Palm-tree Index Indexing with the crowd Ahmed R Mahmood*Walid G. Aref* Eduard Dragut*Saleh Basalamah** *Purdue University**Umm AlQura University.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
Searching on Multi-Dimensional Data
MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Mario Rodriguez Revollo School of Computer Science, UCSP SlimSS-tree: A New Tree Combined SS- tree With Slim-down Algorithm Lifang Yang, Xianglin Huang,
BTrees & Bitmap Indexes
Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Database Management 9. course. Execution of queries.
Querying Structured Text in an XML Database By Xuemei Luo.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
A fast algorithm for the generalized k- keyword proximity problem given keyword offsets Sung-Ryul Kim, Inbok Lee, Kunsoo Park Information Processing Letters,
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Efficient Metric Index For Similarity Search Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
CS4432: Database Systems II Query Processing- Part 2.
A New Spatial Index Structure for Efficient Query Processing in Location Based Services Speaker: Yihao Jhang Adviser: Yuling Hsueh 2010 IEEE International.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Best-first search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
SIMILARITY SEARCH The Metric Space Approach
Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.
RE-Tree: An Efficient Index Structure for Regular Expressions
Introduction to Query Optimization
Comparing Genetic Algorithm and Guided Local Search Methods
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Similarity Search: A Matching Based Approach
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
LSH-based Motion Estimation
Presentation transcript:

Multi-object Similarity Query Evaluation Michal Batko

Motivation Applications where we have –multiple samples of the search object –best match to any of the given objects Examples –Multiple camera shots Tourist takes multiple pictures of a building –Face recognition –3D scanned input –… Laboratory of Data Intensive Systems and Applications2

Multi-object Query Definition Regular similarity query-by-example –one query object + parameters Multi-object query Q({q 1, q 2, …, q n }, params) –several query objects + parameters –goal is to find the results that are best with respect to any of the query objects 3 r q1q1 r q2q2 Range query – R({q1,q2}, r) q1q1 q2q2 k = 5 kNN query – kNN({q1,q2}, k)

State of the Art Relational databases –Optimizations for searching multiple values in indexes Text search –Special case: multiple query documents Combined vector of keywords (cannot just boost the frequencies) Multi-modal search –Specialization of the technique: Only one domain, minimum aggregation function Vector space approach –Compute a new single-query object Tree structure evaluation extension –Multiple paths of a tree are traversed Laboratory of Data Intensive Systems and Applications4

Base-line Approach (Multi-modal search) Evaluate a separate query for each query object Merge the results Pros –Use engine without modification –Can evaluate any query Cons –Same part of the index is likely visited multiple times –Parallelization drawback Laboratory of Data Intensive Systems and Applications5 Multi-object query Search engine q1q1 q2q2 Query Result 1Result 2 Merged result

Vector space approach Compute a single query –Can be centroid of the queries or some specialized algorithm that utilizes other knowledge about the space Execute as normal single query Pros –Use engine without modification Cons –Need to specify the query merging algorithm –Approximation Laboratory of Data Intensive Systems and Applications6 Multi-object query Search engine q1q1 q2q2 Result q 1 +q 2

Tree-based approach Modify tree traversal –In each branch, traverse if any of the query objects matches Pros –Efficient, precise –Data are accessed once Cons –Specialized structure is needed –Usable for “prunable” queries such as range query modification for kNN exists –traverse to most likely leaf for each query –use the radius to backtrack Laboratory of Data Intensive Systems and Applications7 Multi-object query Search engine q1q1 q2q2 Result

Hash-based approach (M-Index) Inspired by the tree multi-object search –Compute the hash function for each query object If a tree is used to compute the hash, use tree m-o-q –Expand the accessed buckets to the neighbors Using the heuristic for computing the adjacent cell probability –Sort the buckets according to the minimum distance to the set of query objects Could be incorporated into the heuristic? Laboratory of Data Intensive Systems and Applications8

Hash-based approach (M-Index) Laboratory of Data Intensive Systems and Applications9

Conclusion Multi-object query –Potentially useful in various application –Not much studied in context of similarity search Algorithm for M-Index can be designed –Heuristic for finding the best-match buckets needs to be properly specified and evaluated Explore the properties of Voronoi space –Can the promising cells be identified directly? Laboratory of Data Intensive Systems and Applications10