IR Homework #2 By J. H. Wang Apr. 13, 2011. Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.

Slides:



Advertisements
Similar presentations
Publishers Web Sites Standard Features. Objectives Access publishers websites Identify general features available on most publishers websites Know how.
Advertisements

Ontario Scholars Portal A guide to the basic features of the search interface of Ontario Scholars Portal at the University of Ottawa Prepared by: Ann Romeril.
Traditional IR models Jian-Yun Nie.
Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
Information Retrieval in Practice
ISP 433/533 Week 2 IR Models.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Searching Full Text 2.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Generalized Vector Space Model Definition Let k i be a vector associated with the index term k i. Independence of index terms in the vector model implies.
Evaluating the Performance of IR Sytems
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
IR Models: Review Vector Model and Probabilistic.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Documents as vectors Each doc j can be viewed as a vector of tf.idf values, one component for each term So we have a vector space terms are axes docs live.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
Chapter 6: Information Retrieval and Web Search
Homework Assignment #1 J. H. Wang Oct. 2, 2015.
LearningSpace 2.0. What is LearningSpace 2.0 Program designed for project-based learning and real-time collaboration in virtual workspaces. Includes safe.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Homework #1 By J. H. Wang Mar. 21, Programming Exercise #1: Vector Space Retrieval Goal: to build an inverted index for a text collection, and.
Homework Assignment #1 J. H. Wang Oct. 13, Homework #1 Chap.1: 1.24 Chap.2: 2.13 Chap.3: 3.5, 3.13* (or 3.14*) Chap.4: 4.6, 4.12* –(*: optional.
Homework Assignment #1 J. H. Wang Oct. 6, 2011.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
IR Homework #1 By J. H. Wang Mar. 16, Programming Exercise #1: Vector Space Retrieval - Indexing Goal: to build an inverted index for a text collection.
IR Homework #1 By J. H. Wang Mar. 5, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Vector Space Models.
IR Homework #3 By J. H. Wang May 10, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Homework #1: C++ Basics, Flow of Control, and Function Basics By J. H. Wang Mar. 13, 2012.
Information Retrieval
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
Homework Assignment #1 J. H. Wang Oct. 11, 2013.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
IR Homework #1 By J. H. Wang Mar. 25, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Information Retrieval in Practice
Search Engine Architecture
Proposal for Term Project
Homework Assignment #1 J. H. Wang Oct. 11, 2016.
Information Retrieval and Web Search
Big Data Analytics: HW#3
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Chapter 5: Information Retrieval and Web Search
Homework #1 Chap. 1, 3, 4 J. H. Wang Oct. 2, 2018.
Homework #2 J. H. Wang Oct. 18, 2018.
Information Retrieval and Web Design
Lab 2: Information Retrieval
Programming Assignment Tutorial
Introduction to information retrieval
Presentation transcript:

IR Homework #2 By J. H. Wang Apr. 13, 2011

Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query –(simple search: keyword, Boolean) Output: a ranked list of search results from Reuters collection –(details to be described later)

Input: User Query Simple search –Keyword Ex: Malaysia, Nuclear, … –Free text Ex: United Nations, Nuclear Submarine Fleet, … –Simple Boolean search Ex: Israel OR Pakistan, …

Output: Ranked List A ranked list of search results from Reuters collection –Term weighting scheme: TF-IDF –Ranking: vector space model, i.e. the cosine similarity between query and document vectors w ij = (1+ log tf ij ) * log (N/df i )

Example Output Ex: –Query: “ Bangladesh ” –Result: …

Optional Features Optional functionalities –Better user interface for search –Complex queries: phrase, wildcard, substring, proximity search, combinations of Boolean operators, … (Ch.2 & 3) –Spell-correction, phonetic correction, … (Ch.3) –Champion lists, impact-ordering, tiered index, … (Ch.7) –Different ranking/term weighting schemes: variants of TF-IDF, … (Ch.6) –Able to be turned on/off by a parameter trigger

Submission Your submission *should* include –The source code (and optionally your executable file) –A one-page description that includes the following Major features in your work (ex: high efficiency, low storage, multiple input formats, huge corpus, …) Major difficulties encountered Special requirements for execution environments (ex: Java Runtime Environment, special compilers, …) The names and the responsible parts of each individual member should be clearly identified for team work Due: two weeks (Apr. 27, 2011)

Submission Instructions Programs or homework in electronic files must be submitted directly on the submission site: – Submission site: Username: your student ID Password: (Please change your default password at your first login) – Preparing your submission file : as one single compressed file Remember to specify the names of your team members and student ID in the files and documentation –If you cannot successfully submit your work, please contact with the TA

Evaluation Some example queries will be submitted to your program, and the ranked list will be checked for effectiveness (recall and precision) – Minimum requirement: simple keyword and Boolean queries Optional features will be considered as bonus –Various query types, weighting schemes, efficient scoring and ranking, … You might be required to demo if the program submitted was unable to run by TA

Questions?