HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli  Nora Reyes UNIVERSIDAD NACIONAL DE SAN LUIS 1

OBJETIVES  Speed up multimedia database queries through search index using High Performance Computing.  Search Index: Permutation.  High Performance Computing: Parallel programming on NVIDIA GPU. 2

INTRODUCTION  Multimedia Data.  How to resolve queries?  Similarty Search.  Metric Space Model: is a paradigm that allows to modelize all the similarity search problems.  Metric Data Base: store objects of a metric space and let resolve similiraty search. 3

INTRODUCTION  A metric space (X, d) is composed of a universe of valid objects X and a distance function d : X × X → R + defined among them.  The distance function determines the similarity (or dissimilarity) between two given objects and satisfies several properties which make it a metric.  Similarity Search: given a dataset of | U |= n objects, a query can be trivially answered by performing n distance evaluations.  There are two main queries of interest:  Range Searching.  The k Nearest Neighbors(k-NN). 4

SEARCH INDEX  The saved information in the index can vary, some indices store a subset of distances between objects, others maintain just a range of distance values.  The goal is to preprocess the dataset such that queries can be answered with as few distance computations as possible.  One of these indices is the Permutation Index. 5

INDEX: PERMUTATION  The algorithm based on permutation is a probabilistic algorithm.  Predict proximity between elements, using its permutations.  If two elements are similar then their permutations are similar.  Preprocessing step: compute the permutation of each element of the database.  All permutations are stored to form the index. 6

GPU - CUDA  GPU was developed with a highly parallel structure, high memory bandwidth.  GPU has high throughput becouse of the compute capability of thousands of threads.  GPU characteristics:  Several streams multiprocessors.  CPU – GPU memory hierarchy.  Threads running in parallel.

PERMUTATION ON GPU  Build a searching index: Permutants  Solving similarity queries on a Data Base. 8

GPU-CUDA PERMUTATION INDEX 9 The Indexing process has two stages: 1- Calculates the distance among every object in database and the permutants. 2- Sets up the signatures of all objects in database, i.e. all object permutations. Each thread compute an object permutation.

SOLVING APPROXIMATE QUERIES 1- Compute the permutation of query object. Each thread compute one permutation. 2- Contrast the permutation of query object with the index, according to footrule distance. 3- Sort the Footrule distances. They are sort with the quicksort implemented in parallel. 10

SOLVING APPROXIMATE QUERIES 4- Depending on the type of query we evaluate the selected object. 4.1- Range search: select items whose distance is less than a reference range. 4.2 -KNN search: 4.2.1: compute de edit distance. 4.2.2: sort the distances with the quicksort and select the k first items of the sorted list. 11

SOLVING PARALLELY MANY QUERIES 12  It is not enough to speed up the time to answer only one query, but it is necessary to leverage the capabilities of the GPU to parallely answer several queries.  The permutation index is built once and then is used to answer many queries.  GPU receives the queries set and it has to solve all of them.

ANALYSIS OF EXPERIMENTAL RESULTS We did experiments on:  Size of Data Base: 4KB, 29KB y 84KB.  Metric Data Base: English Words.  Distance Function: Edit Distance. CPU characteristics: Intel corei3, 2.13 GHz, 3 GB of memory. 13

ANALYSIS OF EXPERIMENTAL RESULTS  GPU CHARACTERISTICS: Ge Force GPU Global Memory SMSPClock Rate Computing capacity GTX330512 MB6481.04GHz1.2 GTX550Ti1024 MB41921.96 GHz2.1 GTX520MX1024 MB1481.8 GHz2.1 14

ANALYSIS OF EXPERIMENTAL RESULTS 15 #permutantesGT520MXGTX550TiGTX330 12827639.7229310.6316973.21 6429539.5729362.7716379.24 528197.2729604.32164740.46 #permutantesGT520MXGTX550TiGTX330 12819824.2519377.6810850.85 6419797.8318857.3211137.65 519906.5919121.1611262.48 Range Search Throughput Knn Throughput

ANALYSIS OF EXPERIMENTAL RESULTS 16  The next figure show the obteined acceleration in range queries and K-NN queries for 80 queries solved in parallel.  Range queries show improvements respect to k-NN queries.  The best case is for largest database and maximun number of permutant.

ANALYSIS OF EXPERIMENTAL RESULTS 17 Speedup of Range search Queries on three different GPUs. Speedup of k-NN Search Queries for different number of parallel queries

ANALYSIS OF EXPERIMENTAL RESULTS 18 Speedup of GPU-Qsort and Thrust on three different GPUs  Our implementation obtains better speed up than the solution using thrust library.  it is important to notice the independence of GPU-Qsort from GPU characteristics, it works fine in all GPU

CONCLUSIONS  Implementation of an Index: Permutantes used to approximate similarity searches in databases of words.  Empirical Evaluation: improvements obteined in different architectures considered. 19

FUTURE WORK  We plan to make an exhaustive experimental evaluation considering others kinds of database, comparing with other solutions that apply GPU in the scenario of metric space similarity searches.  We need also to evaluate retrieval effectiveness of the answer of the Permutation Index, as the number of objects directly compared with the query grows, by using Recall and precission measures.  Exploiting the power of GPUs using optimization techniques to increase performance in solving many parallel query. 20

THANKS FOR YOUR ATTENTION Questions? 21 Mariela Lopresti: omlopres@unsl.edu.ar Natalia Miranda:ncmiran@unsl.edu.ar Fabiana Piccoli:mpiccoli@unsl.edu.ar Nora Reyes:nreyes@unsl.edu.ar

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.

Similar presentations

Presentation on theme: "HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.

Similar presentations

Presentation on theme: "HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli."— Presentation transcript:

Similar presentations

About project

Feedback