Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

Similar presentations


Presentation on theme: "Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated."— Presentation transcript:

1 Sudhanshu Khemka

2

3  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated using the tf-idf weighing scheme where tf is the total number of occurrences of the term in the document, while idf is the inverse document frequency of the term.  As the query is also a mini document, the model represents the query as a vector.  Similarity between two vectors can be found as follows:

4

5

6  Lot of research has been done to develop efficient algorithms for the CPU that improve query response time  We look at the task of improving the query response time from a different perspective  Instead of just focusing on writing efficient algorithms for the CPU, we shift our focus to the processor and formulate the following question: “Can we accelerate search engine query processing using the GPU?”

7  GPU’s programming model highly suitable for processing data in parallel  Allows programmers to define a grid of thread blocks. Each thread in a thread block can execute a subset of the operations in parallel:  Useful for information retrieval as the score of each document can be computed in parallel.

8  Ding et.al. in their paper, “Using Graphics Processors for High Performance IR Query Processing,” implement a variant of the vector space model, the Okapi BM25, on the GPU and demonstrate promising results.  Okapi BM25:  In particular, they provide data parallel algorithms for inverted list intersection, list compression, and top k scoring.

9  Propose an efficient implementation of the second ranking model, the LM based approach to document scoring, on the GPU  Method:  Apply a divide and conquer approach as need to compute P(q|d) for each document in the collection  Each block in the GPU would calculate the score of a subset of the total documents, sort the scores, and transfer the results to an array in the global memory of the GPU  After all the blocks have written the sorted scores to the array in the global memory, we would use a Parallel merge algorithm to merge the results and return the top k results.  Satish et. al., in their paper “Designing Efficient Sorting Algorithms for Manycore GPUs,” provide an efficient implementation of merge sort that is the fastest among all other implementations in the literature.


Download ppt "Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated."

Similar presentations


Ads by Google