Download presentation
Presentation is loading. Please wait.
Published byJessica Edith Fleming Modified over 9 years ago
1
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES Mariela Lopresti Natalia Miranda Fabiana Piccoli Nora Reyes UNIVERSIDAD NACIONAL DE SAN LUIS 1
2
OBJETIVES Speed up multimedia database queries through search index using High Performance Computing. Search Index: Permutation. High Performance Computing: Parallel programming on NVIDIA GPU. 2
3
INTRODUCTION Multimedia Data. How to resolve queries? Similarty Search. Metric Space Model: is a paradigm that allows to modelize all the similarity search problems. Metric Data Base: store objects of a metric space and let resolve similiraty search. 3
4
INTRODUCTION A metric space (X, d) is composed of a universe of valid objects X and a distance function d : X × X → R + defined among them. The distance function determines the similarity (or dissimilarity) between two given objects and satisfies several properties which make it a metric. Similarity Search: given a dataset of | U |= n objects, a query can be trivially answered by performing n distance evaluations. There are two main queries of interest: Range Searching. The k Nearest Neighbors(k-NN). 4
5
SEARCH INDEX The saved information in the index can vary, some indices store a subset of distances between objects, others maintain just a range of distance values. The goal is to preprocess the dataset such that queries can be answered with as few distance computations as possible. One of these indices is the Permutation Index. 5
6
INDEX: PERMUTATION The algorithm based on permutation is a probabilistic algorithm. Predict proximity between elements, using its permutations. If two elements are similar then their permutations are similar. Preprocessing step: compute the permutation of each element of the database. All permutations are stored to form the index. 6
7
GPU - CUDA GPU was developed with a highly parallel structure, high memory bandwidth. GPU has high throughput becouse of the compute capability of thousands of threads. GPU characteristics: Several streams multiprocessors. CPU – GPU memory hierarchy. Threads running in parallel.
8
PERMUTATION ON GPU Build a searching index: Permutants Solving similarity queries on a Data Base. 8
9
GPU-CUDA PERMUTATION INDEX 9 The Indexing process has two stages: 1- Calculates the distance among every object in database and the permutants. 2- Sets up the signatures of all objects in database, i.e. all object permutations. Each thread compute an object permutation.
10
SOLVING APPROXIMATE QUERIES 1- Compute the permutation of query object. Each thread compute one permutation. 2- Contrast the permutation of query object with the index, according to footrule distance. 3- Sort the Footrule distances. They are sort with the quicksort implemented in parallel. 10
11
SOLVING APPROXIMATE QUERIES 4- Depending on the type of query we evaluate the selected object. 4.1- Range search: select items whose distance is less than a reference range. 4.2 -KNN search: 4.2.1: compute de edit distance. 4.2.2: sort the distances with the quicksort and select the k first items of the sorted list. 11
12
SOLVING PARALLELY MANY QUERIES 12 It is not enough to speed up the time to answer only one query, but it is necessary to leverage the capabilities of the GPU to parallely answer several queries. The permutation index is built once and then is used to answer many queries. GPU receives the queries set and it has to solve all of them.
13
ANALYSIS OF EXPERIMENTAL RESULTS We did experiments on: Size of Data Base: 4KB, 29KB y 84KB. Metric Data Base: English Words. Distance Function: Edit Distance. CPU characteristics: Intel corei3, 2.13 GHz, 3 GB of memory. 13
14
ANALYSIS OF EXPERIMENTAL RESULTS GPU CHARACTERISTICS: Ge Force GPU Global Memory SMSPClock Rate Computing capacity GTX330512 MB6481.04GHz1.2 GTX550Ti1024 MB41921.96 GHz2.1 GTX520MX1024 MB1481.8 GHz2.1 14
15
ANALYSIS OF EXPERIMENTAL RESULTS 15 #permutantesGT520MXGTX550TiGTX330 12827639.7229310.6316973.21 6429539.5729362.7716379.24 528197.2729604.32164740.46 #permutantesGT520MXGTX550TiGTX330 12819824.2519377.6810850.85 6419797.8318857.3211137.65 519906.5919121.1611262.48 Range Search Throughput Knn Throughput
16
ANALYSIS OF EXPERIMENTAL RESULTS 16 The next figure show the obteined acceleration in range queries and K-NN queries for 80 queries solved in parallel. Range queries show improvements respect to k-NN queries. The best case is for largest database and maximun number of permutant.
17
ANALYSIS OF EXPERIMENTAL RESULTS 17 Speedup of Range search Queries on three different GPUs. Speedup of k-NN Search Queries for different number of parallel queries
18
ANALYSIS OF EXPERIMENTAL RESULTS 18 Speedup of GPU-Qsort and Thrust on three different GPUs Our implementation obtains better speed up than the solution using thrust library. it is important to notice the independence of GPU-Qsort from GPU characteristics, it works fine in all GPU
19
CONCLUSIONS Implementation of an Index: Permutantes used to approximate similarity searches in databases of words. Empirical Evaluation: improvements obteined in different architectures considered. 19
20
FUTURE WORK We plan to make an exhaustive experimental evaluation considering others kinds of database, comparing with other solutions that apply GPU in the scenario of metric space similarity searches. We need also to evaluate retrieval effectiveness of the answer of the Permutation Index, as the number of objects directly compared with the query grows, by using Recall and precission measures. Exploiting the power of GPUs using optimization techniques to increase performance in solving many parallel query. 20
21
THANKS FOR YOUR ATTENTION Questions? 21 Mariela Lopresti: omlopres@unsl.edu.ar Natalia Miranda:ncmiran@unsl.edu.ar Fabiana Piccoli:mpiccoli@unsl.edu.ar Nora Reyes:nreyes@unsl.edu.ar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.