Homework #1 J. H. Wang Oct. 24, 2011.

Slides:



Advertisements
Similar presentations
Advanced topics in Computer Science Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Multimedia Database Systems
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Lecture 2: Boolean Retrieval Model.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Information Retrieval Review
ISP 433/633 Week 10 Vocabulary Problem & Latent Semantic Indexing Partly based on G.Furnas SI503 slides.
Information Retrieval using the Boolean Model. Query Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? Could grep all.
Information Retrieval IR 6. Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web Search – Summer Term 2006 II. Information Retrieval (Basics) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Introducing Information Retrieval and Web Search
Chapter 5: Information Retrieval and Web Search
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
CS347 Lecture 2 April 9, 2001 ©Prabhakar Raghavan.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 1 Boolean retrieval.
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
LIS618 lecture 2 the Boolean model Thomas Krichel
Recap Preprocessing to form the term vocabulary Documents Tokenization token and term Normalization Case-folding Lemmatization Stemming Thesauri Stop words.
Modern Information Retrieval Lecture 3: Boolean Retrieval.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Homework #4: Operator Overloading and Strings By J. H. Wang May 8, 2012.
Chapter 6: Information Retrieval and Web Search
Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 1: Introduction and Boolean retrieval.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Homework #1 J. H. Wang Oct. 5, 2015.
IR Homework #1 By J. H. Wang Mar. 21, Programming Exercise #1: Vector Space Retrieval Goal: to build an inverted index for a text collection, and.
Homework Assignment #1 J. H. Wang Oct. 13, Homework #1 Chap.1: 1.24 Chap.2: 2.13 Chap.3: 3.5, 3.13* (or 3.14*) Chap.4: 4.6, 4.12* –(*: optional.
Homework #3 J. H. Wang Nov. 1, Homework #3 Chap. 4 –4.1 (c) –4.7 (c) –4.8 (a)(b)(c) –4.11.
Introduction to Information Retrieval CSE 538 MRS BOOK – CHAPTER I Boolean Model 1.
Information Retrieval Lecture 1. Query Which plays of Shakespeare contain the words Brutus AND Caesar but NOT Calpurnia? Could grep all of Shakespeare’s.
Homework Assignment #1 J. H. Wang Oct. 6, 2011.
IR Homework #1 By J. H. Wang Mar. 16, Programming Exercise #1: Vector Space Retrieval - Indexing Goal: to build an inverted index for a text collection.
IR Homework #1 By J. H. Wang Mar. 5, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Homework #4: Operator Overloading and Strings By J. H. Wang Apr. 17, 2009.
Homework #1 J. H. Wang Oct. 2, 2013.
Introduction to Information Retrieval Boolean Retrieval.
Information Retrieval and Web Search Boolean retrieval Instructor: Rada Mihalcea (Note: some of the slides in this set have been adapted from a course.
Homework Assignment #1 J. H. Wang Oct. 11, 2013.
Homework #4: Operator Overloading and Strings By J. H. Wang May 12, 2014.
Homework #5: Pointers, Dynamic Arrays and Inheritance By J. H. Wang Jun. 5, 2009.
Homework #2 J. H. Wang Mar. 29, Homework #2 Chap.3 –3.5 (a) –3.8 (b) –3.16 –3.23 (a)
IR Homework #1 By J. H. Wang Mar. 25, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Homework #2 J. H. Wang Oct. 31, 2012.
Introduction to Information Retrieval Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
Term weighting and Vector space retrieval
Homework #4: Operator Overloading and Strings By J. H. Wang May 22, 2015.
Introduction to Information Retrieval Introducing Information Retrieval and Web Search.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
3: Search & retrieval: Structures. The dog stopped attacking the cat, that lived in U.S.A. collection corpus database web d1…..d n docs processed term-doc.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.)
Large Scale Search: Inverted Index, etc.
Slides from Book: Christopher D
7CCSMWAL Algorithmic Issues in the WWW
Homework #2 J. H. Wang Oct. 19, 2017.
Information Retrieval
Boolean Retrieval.
Boolean Retrieval.
Information Retrieval and Web Search Lecture 1: Boolean retrieval
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
Information Retrieval
Lectures 4: Skip Pointers, Phrase Queries, Positional Indexing
Boolean Retrieval.
Homework #2 J. H. Wang Oct. 18, 2018.
Information Retrieval and Web Design
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Homework #1 J. H. Wang Oct. 24, 2011

Homework #1 Chap.1 1.2 1.7 Chap.2 2.1 2.8

1.2 Consider these documents: (Note: Please do NOT apply stemming and stopword removal.) Doc1: breakthrough drug for schizophrenia Doc2: new schizophrenia drug Doc3: new approach for treatment of schizophrenia Doc4: new hopes for schizophrenia patients (a) Draw the term-document incidence matrix for this document collection. (Note: No need for positional information.) (b) Draw the inverted index representation for this collection.

1.7: Recommend a query processing order for (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list sizes: Term Postings size eyes 213312 kaleidoscope 87009 marmalade 107913 skies 271658 tangerine 46653 trees 316812

2. 1: Are the following statements true or false 2.1: Are the following statements true or false? (a) In a Boolean retrieval system, stemming never lowers precision. (b) In a Boolean retrieval system, stemming never lowers recall. (c) Stemming increases the size of the vocabulary. (d) Stemming should be invoked at indexing time but not while processing a query.

2.8: Assume a biword index in which we consider every pair of consecutive terms in a document as a phrase. Give an example of a document which will be returned for a query of “New York University” but is actually a false positive which should not be returned.

Submission Submission Due: two weeks (Nov. 7, 2011) Hand-written exercises: hand in your paper version in class Programming exercises: to be announced Due: two weeks (Nov. 7, 2011)

Any Questions or Comments?