What is Information Retrieval (IR)?

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
© Tefko Saracevic, Rutgers University1 Interaction in information retrieval There is MUCH more to searching than knowing computers, networks & commands,
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
9/6/2001Information Organization and Retrieval Introduction to Information Retrieval (cont.): Boolean Model University of California, Berkeley School of.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
September 7, 2000Information Organization and Retrieval Introduction to Information Retrieval Ray Larson & Marti Hearst University of California, Berkeley.
Overview of Search Engines
What is Information Retrieval (IR)? Thanks to: UCB Course SIMS 202 and IIT Course on IR Jim Gray Rich Belew.
Search Engines and Information Retrieval Chapter 1.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2006.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Jane Reid, AMSc IRIC, QMUL, 30/10/01 1 Information seeking Information-seeking models Search strategies Search tactics.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval. What is IR? Sit down before fact as a little child, be prepared to give up every conceived notion, follow humbly.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
Information Retrieval in Practice
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
What is Information Retrieval (IR)?
Search Engine Architecture
Lecture 1: Introduction and the Boolean Model Information Retrieval
Information Retrieval (in Practice)
Information Retrieval and Web Search
Object oriented system development life cycle
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Information Retrieval and Web Search
Compiler Construction
Thanks to Bill Arms, Marti Hearst
Document Clustering Matt Hughes.
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Information Retrieval and Web Design
Structure of IR Systems
Introduction to Search Engines
Information Seeking Models
Information Retrieval
Presentation transcript:

What is Information Retrieval (IR)? Adapted from UCB Course SIMS 202 and IIT Course on IR

What is information retrieval Gathering information from a source(s) based on a need Major assumption - that information exists. Broad definition of information Sources of information Other people Archived information (libraries, maps, etc.) Web Radio, TV, etc.

Information retrieved Impermanent information Conversation Documents Text Video Files Etc.

The information acquisition process Know what you want and go get it Ask questions to information sources as needed (queries) - SEARCH Have information sent to you on a regular basis based on some predetermined information need Push/pull models

What IR assumes Information is stored (or available) A user has an information need An automated system exists from which information can be retrieved Why an automated system? The system works!!

What IR is usually not about Usually just unstructured data Retrieval from databases is usually not considered Database querying assumes that the data is in a standardized format Transforming all information, news articles, web sites into a database format is difficult for large data collections

What an IR system should do Store/archive information Provide access to that information Answer queries with relevant information Stay current WISH list Understand the user’s queries Understand the user’s need Acts as an assistant

How good is the IR system Measures of performance based on what the system returns: Relevance Coverage Recency Functionality (e.g. query syntax) Speed Availability Usability Time/ability to satisfy user requests

How do IR systems work Algorithms implemented in software Gathering methods Storage methods Indexing Retrieval Interaction

Memex - 1945 Vannevar Bush

Some IR History Roots in the scientific “Information Explosion” following WWII Interest in computer-based IR from mid 1950’s H.P. Luhn at IBM (1958) Probabilistic models at Rand (Maron & Kuhns) (1960) Boolean system development at Lockheed (‘60s) Vector Space Model (Salton at Cornell 1965) Statistical Weighting methods and theoretical advances (‘70s) Refinements and Advances in application (‘80s) User Interfaces, Large-scale testing and application (‘90s) Then came the web and search engines and everything changed

Existing IR System Search Engine

A Typical Web Search Engine Index Query Engine Interface Indexer Users Crawler Web A Typical Web Search Engine

Crawlers Web crawlers (spiders) gather information (files, URLs, etc) from the web. Primitive IR systems

Finding Out About (FOA) (Reference R. Belew) Three phases: Asking of a question (the Information Need) Construction of an answer (IR proper) Assessment of the answer (Evaluation) Part of an iterative process

What is different about IR from other areas, say Computer Science Many problems have a right answer How much money did you make last year? IR problems usually don’t Find all documents relevant to “hippos in a zoo”

IR is an Iterative Process Repositories Workspace Goals

User’s Information Need text input Query Parse

Collections Pre-process Index

User’s Information Need Collections Pre-process text input Query Index Parse Rank or Match

User’s Information Need Collections Pre-process text input Query Index Parse Rank or Match Query Reformulation

Question Asking Person asking = “user” In a frame of mind, a cognitive state Aware of a gap in their knowledge May not be able to fully define this gap Paradox of Finding Out About something: If user knew the question to ask, there would often be no work to do. “The need to describe that which you do not know in order to find it” Roland Hjerppe Query External expression of this ill-defined state

Question Answering Consider - question answerer is human. Can they translate the user’s ill-defined question into a better one? Do they know the answer themselves? Are they able to verbalize this answer? Will the user understand this verbalization? Can they provide the needed background? Consider - answerer is a computer system.

Assessing the Answer How well does it answer the question? Complete answer? Partial? Background Information? Hints for further exploration? How relevant is it to the user? Introduce notion of relevance.

IR is usually a dialog The exchange doesn’t end with first answer User can recognize elements of a useful answer Questions and understanding changes as the process continues.

A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q4 Q3 Q1 Q5 Q0

Berry-picking model Berry-picking is greedy search – grab what you can see or that is nearby The query is continually shifting New information may yield new ideas and new directions The information need is not satisfied by a single, final retrieved set is satisfied by a series of selections and bits of information found along the way.

Information Seeking Behavior Two parts of the process: search and retrieval analysis and synthesis of search results

Search Tactics and Strategies Bates 79 Search Strategies Bates 89 O’Day and Jeffries 93

Tactics vs. Strategies Tactic: short term goals and maneuvers operators, actions Strategy: overall planning link a sequence of operators together to achieve some end

Restricted Form of the IR Problem The system has available only pre-existing, “canned” text passages. Its response is limited to selecting from these passages and presenting them to the user. It must select, say, 10 or 20 passages out of millions or billions!

Information Retrieval Revised Task Statement: Build a system that retrieves documents that users are likely to find relevant to their queries. This set of assumptions underlies the field of Information Retrieval.

Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

Structure of an IR System Search Line Storage Line Interest profiles & Queries Documents & data Information Storage and Retrieval System Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Formulating query in terms of descriptors Indexing (Descriptive and Subject) Storage of profiles Storage of Documents Store1: Profiles/ Search requests Store2: Document representations Comparison/ Matching Adapted from Soergel, p. 19 Potentially Relevant Documents

Measures of performance How good is that IR system? BUDLITE SEARCH – never fills you up.

Is IR Knowledge Creation? If what is collected is indexed and used.