 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
An Introduction to Information Retrieval and Applications J. H. Wang Feb. 19, 2008.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
IR Models: Overview, Boolean, and Vector
Information Retrieval in Practice
PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Modern Information Retrieval Chapter 1: Introduction
WMES3103 : INFORMATION RETRIEVAL
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFORMATION RETRIEVAL WEEK 1 AND 2
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
1 Information Retrieval and Web Search Introduction.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Chapter 1: Introduction to IR.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Modern Information Retrieval Computer engineering department Fall 2005.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CSCE 5300 Information Retrieval and Web Search Introduction to IR models and methods Instructor: Rada Mihalcea Class web page:
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Systems Info624 – Week 1 Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
I NFORMATION R ETRIEVAL AND W EB S EARCH Jianping Fan Department of Computer Science UNC-Charlotte 1.
1 TP6084 CAPAIAN MAKLUMAT INFORMATION RETRIEVAL (IR) Introduction.
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
Information Retrieval and Web Search Vasile Rus, PhD websearch/
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Modern Information Retrieval
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Multimedia Information Retrieval
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
CS246: Information Retrieval
Search Engine Architecture
Information Retrieval and Extraction
Information Retrieval and Web Design
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Search
Presentation transcript:

 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find all docs containing information on college tennis teams which: (1) are maintained by a USA university and (2) participate in the NCAA tournament.  Emphasis is on the retrieval of information (not data)

Motivation ► IR at the center of the stage  IR in the last 20 years: ► classification and categorization ► systems and languages ► user interfaces and visualization  Still, area was seen as of narrow interest  Advent of the Web changed this perception once and for all ► universal repository of knowledge ► free (low cost) universal access ► no central editorial board ► many problems though: IR seen as key to finding the solutions!

Motivation ► Data retrieval  which docs contain a set of keywords?  Well defined semantics  a single erroneous object implies failure! ► Information retrieval  information about a subject or topic  semantics is frequently loose  small errors are tolerated ► IR system:  interpret contents of information items  generate a ranking which reflects relevance  notion of relevance is most important

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).

 Salton-1989: ” Information-retrieval systems process files of records and requests for information, and identify and retrieve from files certain records is response to the information requests. the retrieval depends on similarity between records and queries.  Kowalski-1997: ” An information retrieval systems is a systems that is capable of storage, retrieval, and maintenance of information.

 IR is a branch of applied computer science focusing on the representation, storage, organization, access, and distribution of information.  IR involves helping users find information that matches their information needs. System- centered View User- centered

 IR systems contain three components:  System  People  Documents (information items) User System Documents

Basic Concepts ► The User Task  Retrieval ► information or data  Browsing ( main objectives are not clearly defined and might change during the interaction ) ► Task of Hypertest  Modern Digital libraries and web interfaces - try to combine retreival + browsing ► WWW(retrieval & browsing) either Pulling or Pushing Retrieval Browsing Database

Basic Concepts ► Logical view of the documents Text operations or transformation Reduce the complexity of the document representation and allow moving the logical view from that of full text to that of a set of index terms ► Several intermediate logical views (of a document) might be adopted structure Accents spacing stopwords Noun groups stemming Manual indexing Docs structureFull textIndex terms

► “ Stop Words ”  Certain words are considered irrelevant and not placed in the bag(e.g., the, HTML Tage like )  “ Stemming ” and other cotent analysis  Stimming:Reduce terms to their roots before indexing  Use English-specific rules, convert word to their basic form example( surfing, surfed  surf Identification of noun groups:eliminates adjectives, adverbs, and verbs ► logical view of docs might shift

Processor Document Input Feedback Output Queries

User Interface Text Operations Query Operations Indexing Searching Ranking Index Text query user need user feedback ranked docs retrieved docs logical view inverted file DB Manager Module 4, 10 6, Text Database Text The Retrieval Process

 Text Operation: forms index words (token)  Tokenization-Stopwords removal  Stemming  Indexing: construct an inverted index of words to document pointers.  Mapping from keyword to document ids  Searching: retrieves documents that contain a given query token from the inverted index  Ranking: scores all retrieved documents according to a relevance metric  A ranking is based on fundamental premises regarding the notion of relevance such as:  Common set of index terms  Sharing of weight terms  Likelihood of relevance  Each set of premises leads to distinct IR Model

 User Interface: Manages interaction with user:  Query input and document output  Visualization of result  Relevance feedback  Query operation: transform the query to improve retrieval:  Query expansion using a thesaurus.chpt2  Query transformation using relevance feedback chpt2

 Simplest notion of relevance is that the query string appears verbatim (order is important) in the document  Slightly less strict notion is that the word in the query appears frequently in the document, in any order (bag of words)  May not retrieve relevant documents that include synonymous terms  Restaurant vs. caf é  May retrieve irrelevant document that include ambiguous terms  Apple (company or fruit)  Bit (unit of data or act of eating)

 Data  String of symbols associated with objects, people, and events  Values of an attribute  Data need not have meaning to everyone  Data must be interpreted with associated attributes.

 Information  The meaning of the data interpreted by a person or a system  Data that changes the state of a person or system that perceives it.  Data that reduces uncertainty.  if data contain no uncertainty, there are no information with the data.  Examples: It snows in the winter. It does not snow this winter.

 knowledge  Structured information  through structuring, information becomes understandable  Processed Information  through processing, information becomes meaningful and useful  information shared and agreed upon within a community Data information knowledge

 Strings of ASCII symbols or Unicode  structured by the author  indexed by information service providers  Representation of natural languages people use  To convey meanings  To communicate between readers and authors.  Data or information?  If it can be understood, it ’ s information.  by Whom? A person or a system?

 Logical unit of text  articles, books,  links, web pages  Other components that come with the text  figures, charts, graphics  multimedia

 Repository of human intellectuals  Rich and diverse resources for all answers.  If it is written, it is there (in text)  Meaningful and understandable (to users).  Simple ASCII representation  Free of pre-formatted structures  continuous  separated into documents  Easy to process by the computer  Machine Intensive (not labor intensive)

 Massive  Any IR system needs the capability of large scale data processing.  Use of indexes and various representations are required.  Inconsistent  It’s a human language  Syntactical and semantic variances  Same information expressed in different ways.  Different information expressed in similar ways.  Incomplete  It uses common knowledge.  It’s an open system.

 Retrieval  What do we retrieve?  Data  Information  Knowledge  We retrieve documents that contains text which carries information.  Information can be anywhere  in the text, in the links, in the process of text.

 Are they the same?  Text retrieval  Document retrieval  Information retrieval

 Conceptually, information retrieval is used to cover all related problems in finding needed information  Historically, information retrieval is about document retrieval, emphasizing document as the basic unit  Technically, information retrieval refers to (text) string manipulation, indexing, matching, querying, etc.

Data retrieval Information retrieval ContentData Information Data objectTable Document MatchingExact match Partial match, best match Items wantedMatching Relevant Query languageSQL(artificial) Natural Query specificationComplete Incomplete ModelDeterministic Probabilistic Highly structured less structure

 The goal of IR systems is to help users find information that satisfies their information needs.  The main process of IR systems is to match data abstracted from the real world to queries abstracted from user ’ s information needs.  Information retrieval is much more difficult than data retrieval.