Lecture 8 Information Retrieval Introduction

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

WEB MINING. Why IR ? Research & Fun
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
Information Retrieval Review
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
INFORMATION RETRIEVAL WEEK 1 AND 2
1 Information Retrieval and Web Search Introduction.
Advance Information Retrieval Topics Hassan Bashiri.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
1 CS 430: Information Discovery Lecture 2 Introduction to Text Based Information Retrieval.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Search Engines and Information Retrieval Chapter 1.
Finding Associations in Collections of Text 김유환.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Search Engine Architecture
Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Definition, purposes/functions, elements of IR systems Lesson 1.
Data mining in web applications
Information Retrieval in Practice
Information Retrieval in Practice
Introduction To DBMS.
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Concept Mapping concepts and exercises
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
Modern Information Retrieval
What is Information Retrieval (IR)?
Information Retrieval and Web Search
Search Engine Architecture
CS 430: Information Discovery
Information Retrieval and Web Search
Information Retrieval and Web Search
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Thanks to Bill Arms, Marti Hearst
CS 430: Information Discovery
موضوع پروژه : بازیابی اطلاعات Information Retrieval
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Search Engine Architecture
The ultimate in data organization
Information Retrieval and Web Search
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Introduction to Search Engines
Presentation transcript:

Lecture 8 Information Retrieval Introduction

Information Retrieval Introduction Databases Very formal & logical Input into them is (or can be) very tightly constrained In turn, DB queries are written assuming those constraints Information Retrieval Systems Empirical Cognitive modeling – the way we think

Information Retrieval Introduction Queries based on ‘things already there’ Words, documents What are characteristics of these things? Total # of words in English language What are most common words ? Least common words ? How many total ‘documents’ in the world are there ? How many web pages are there ? What kind of structure does the web have ? How rapidly is it changing ?

Information Retrieval Introduction Users have: An information need Use of information In an IR system, the user dynamically iterates with the system, e. g. “Was this helpful ?”

Information Retrieval Introduction Similar, but not identical, architectures DBMS IR Data Documents DBMS IRS Database Engine Search Engine Query Processor Query Processor UI Queries & Reports Interface to another system UI Retrieved Output Interface to another system

Information Retrieval Introduction Documents Medline, Westlaw, etc various retrieval methods – Boolean, Ranked w/weights, Vector space IRS Search Engine Silverplatter, Dialog, Inktomi Query Processor UI Retrieved Output Interface to another system Post-processing Value Add Via Web GUI, Command line

IRS Components Document preparation & analysis Task Definition Databases Indexing Search/Retrieval Engines Interfaces Usability & Cognitive Tools System Evaluation

Document Preparation & Analysis Formatting tools Mapping to/from formats (XML, PDF, text, postscript, etc) Natural Language Processing/Feature Extractions Stemming Parsing, word sense disambiguation, morphology Tokenization

Filtering, selective dissemination Cross lingual retrieval Task Definition Ad hoc Filtering, selective dissemination Cross lingual retrieval Categorization Topic detection & tracking Redundancy reduction Info synthesis/value add Cross doc/cross time summarization Presentation/visualization Info delivery when & where needed Info assistance Decision support Online analysis Resource discovery

Bibliographic Full text Multi-media Audio & video Web data IR Databases Bibliographic Full text Multi-media Audio & video Web data

Human indexing & Categorization In Everything Is Miscellaneous, Weinberger describes 3 orders of categorization: 1st order – organize things (made of atoms – takes up space) themselves, such as silverware in a drawer or books on a shelf 2nd order – there is a reference to the things themselves, such as a card catalogue that points to the physical space of the 1st order thing (but doesn’t necessarily say much about what’s inside) 3rd order – made of bits (takes up virtually no space) and can get to things ‘inside’ Use Everything is Miscellaneous Reference

Automatic indexing Indexing Algorithms to organize and weight text in documents

Weighted or partial match Link analysis Retrieval/Matching Boolean & exact match Weighted or partial match Link analysis

Interfaces Web GUI ‘Local’ GUI Command Line Gesture – James Bond, Quantum of Solace Minority Report

Dictionaries, Thesauri Gazetteers, CIA World Fact Book Encyclopedias Knowledge Tools Dictionaries, Thesauri Gazetteers, CIA World Fact Book Encyclopedias

Evaluation What questions to ask ? Is the system actually used ? Is it efficient ? Is the system effective ? Are users satisfied ? Do they find relevant information ? Complete information ?

Reading Read As We May Think http://www.theatlantic.com/doc/194507/bush