Tutorial#3.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 7: Scoring and results assembly.
Advertisements

The Mathematics of Information Retrieval 11/21/2005 Presented by Jeremy Chapman, Grant Gelven and Ben Lakin.
Traditional IR models Jian-Yun Nie.
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Basic IR: Modeling Basic IR Task: Slightly more complex:
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
IR Models: Overview, Boolean, and Vector
Detecting Phenotype-Specific Interactions Between Biological Processes Nadeem A. Ansari Department of Computer Science Wayne State University Detroit,
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Hinrich Schütze and Christina Lioma
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Vector Space Model CS 652 Information Extraction and Integration.
The Vector Space Model …and applications in Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Web search basics (Recap) The Web Web crawler Indexer Search User Indexes Query Engine 1 Ad indexes.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Vector Space Models.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Lecture 6: Scoring, Term Weighting and the Vector Space Model
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Automated Information Retrieval
CSCE 590 Web Scraping – Information Extraction II
Information Retrieval and Web Search
Latent Semantic Indexing
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval and Web Search
Representation of documents and queries
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Implementation Based on Inverted Files
CS 430: Information Discovery
4. Boolean and Vector Space Retrieval Models
Recuperação de Informação B
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Boolean and Vector Space Retrieval Models
Retrieval Utilities Relevance feedback Clustering
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Advanced information retrieval
Probabilistic Information Retrieval
CS 430: Information Discovery
VECTOR SPACE MODEL Its Applications and implementations
Presentation transcript:

Tutorial#3

Retrieval models Retrieval models match query with documents to: separate documents into relevant and non-relevant class rank the documents according to the relevance. Boolean model Vector space model (VSM) Probabilistic models

Boolean model Boolean model is most common exact-match model queries are logic expressions with document features as operands In pure Boolean model, retrieved documents are not ranked.

Example D7 OR D1,D2,D5 AND D2,D4,D5,D6,D8 D7 OR D2,D5

Vector space model (VSM) Documents and queries are represented as vectors. dj = (w1,j,w2,j,...,wt,j) q = (w1,q,w2,q,...,wt,q) Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero.

K5 K4 K3 K2 K1 K0 Q0 Q1 Q2 Q3 Q4 K5 K4 K3 K2 K1 K0 D0 D1 D2 D3 D4

Vector space model (VSM) Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is (tf-idf) weighting:

(tf-idf) weighting

Vector space model (VSM)

Example documents: D0:'How to Bake Bread Without Recipes', D1:'The Classic Art of Viennese Pastry', D2:'Numerical Recipes: The Art of Scientific Computing', D3:'Breads, Pastries, Pies and Cakes : Quantity Baking Recipes', D4:'Pastry: A Book of Best French Recipe‘ Keywords : ['bak','recipe','bread','cake','pastr','pie']

will generate a matrix 6 terms x 5 documents 'pie' 'pastr' 'cake' 'bread' 'recipe' 'bak' 1 D0 D1 D2 D3 D4

Query: "baking bread“ will generate a matrix 6 terms x 5 documents 'pie' 'pastr' 'cake' 'bread' 'recipe' 'bak' 1 D0 D1 D2 D3 D4

VSM Implementation VSMranker.java ranks documents for a query Provides functions to develop different user interfaces Stand alone usage needs document and query TDMs java -cp ../java VSMranker cacm.tdm query.tdm 7 Retrieves top 7 documents for CACM queries

Ex#3 (solve in tutorial time)

References: http://www.ccs.neu.edu/home/jaa/CSG339.06F/Lectures/vector.pdf http://www.ccs.neu.edu/home/jaa/CSG339.06F/Lectures/boolean.pdf