Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

Chapter 5: Introduction to Information Retrieval
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Introduction to Information Retrieval
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Hinrich Schütze and Christina Lioma
Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.
The Vector Space Model …and applications in Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 6 Vector Methods 2.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Hinrich Schütze and Christina Lioma Lecture 4: Index Construction
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
Chapter 5: Information Retrieval and Web Search
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Information Retrieval Lecture 2 Introduction to Information Retrieval (Manning et al. 2007) Chapter 6 & 7 For the MSc Computer Science Programme Dell Zhang.
Inverted index, Compressing inverted index And Computing score in complete search system Chintan Mistry Mrugank dalal.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Search A Basic Overview Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata October 20, 2014.
Advanced topics in Computer Science Jiaheng Lu Department of Computer Science Renmin University of China
 CIKM  Implementation of Smoothing techniques on the GPU  Re running experiments using the wt2g collection  The Future.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Modern Information Retrieval Chapter 9: Parallel and Distributed IR Section 9.1: Introduction Section : MIMD Architectures Inverted Files November.
Web- and Multimedia-based Information Systems Lecture 2.
Vector Space Models.
Search Engines WS 2009 / 2010 Prof. Dr. Hannah Bast Chair of Algorithms and Data Structures Department of Computer Science University of Freiburg Lecture.
ISchool, Cloud Computing Class Talk, Oct 6 th Computing Pairwise Document Similarity in Large Collections: A MapReduce Perspective Tamer Elsayed,
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
ACCELERATING QUERY-BY-HUMMING ON GPU Pascal Ferraro, Pierre Hanna, Laurent Imbert, Thomas Izard ISMIR 2009 Presenter: Chung-Che Wang (Focus on the performance.
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda Ranked retrieval  Similarity-based ranking Probability-based ranking.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
1 CS 430: Information Discovery Lecture 5 Ranking.
Modern Information Retrieval
Information Retrieval Lecture 6 Vector Methods 2.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
IR 6 Scoring, term weighting and the vector space model.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
CS 430: Information Discovery
Efficient Ranking of Keyword Queries Using P-trees
Information Retrieval and Web Search
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Implementation Issues & IR Systems
Information Retrieval and Web Search
Basic Information Retrieval
From frequency to meaning: vector space models of semantics
6. Implementation of Vector-Space Retrieval
Chapter 5: Information Retrieval and Web Search
INF 141: Information Retrieval
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

Sudhanshu Khemka

 Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated using the tf-idf weighing scheme where tf is the total number of occurrences of the term in the document, while idf is the inverse document frequency of the term.  As the query is also a mini document, the model represents the query as a vector.  Similarity between two vectors can be found as follows:

 Lot of research has been done to develop efficient algorithms for the CPU that improve query response time  We look at the task of improving the query response time from a different perspective  Instead of just focusing on writing efficient algorithms for the CPU, we shift our focus to the processor and formulate the following question: “Can we accelerate search engine query processing using the GPU?”

 GPU’s programming model highly suitable for processing data in parallel  Allows programmers to define a grid of thread blocks. Each thread in a thread block can execute a subset of the operations in parallel:  Useful for information retrieval as the score of each document can be computed in parallel.

 Ding et.al. in their paper, “Using Graphics Processors for High Performance IR Query Processing,” implement a variant of the vector space model, the Okapi BM25, on the GPU and demonstrate promising results.  Okapi BM25:  In particular, they provide data parallel algorithms for inverted list intersection, list compression, and top k scoring.

 Propose an efficient implementation of the second ranking model, the LM based approach to document scoring, on the GPU  Method:  Apply a divide and conquer approach as need to compute P(q|d) for each document in the collection  Each block in the GPU would calculate the score of a subset of the total documents, sort the scores, and transfer the results to an array in the global memory of the GPU  After all the blocks have written the sorted scores to the array in the global memory, we would use a Parallel merge algorithm to merge the results and return the top k results.  Satish et. al., in their paper “Designing Efficient Sorting Algorithms for Manycore GPUs,” provide an efficient implementation of merge sort that is the fastest among all other implementations in the literature.