PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Advertisements

Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Google and Stanford) and Christopher.
Modern Information Retrieval Chapter 1: Introduction
Information Retrieval
CS276 Information Retrieval and Web Search Lecture 1: Boolean retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 Information Retrieval and Web Search Introduction.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Modern Information Retrieval Chapter 1 Introduction.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Chapter 1: Introduction to IR.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
Information retrieval thur jan data…. framework for today’s lecture…
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
Information retrieval wed sept data…. -start at 6.45.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Modern Information Retrieval Computer engineering department Fall 2005.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engine Architecture
IR Paolo Ferragina Dipartimento di Informatica Università di Pisa.
1 CS276 Information Retrieval and Web Search Lecture 1: Introduction.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
I NFORMATION R ETRIEVAL AND W EB S EARCH Jianping Fan Department of Computer Science UNC-Charlotte 1.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Module 2: Boolean retrieval. Introduction to Information Retrieval Information Retrieval  Information Retrieval (IR) is finding material (usually documents)
Definition, purposes/functions, elements of IR systems Lesson 1.
Information Retrieval and Web Search Vasile Rus, PhD websearch/
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Search Engine Architecture
Information Retrieval (in Practice)
Modern Information Retrieval
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Search
Presentation transcript:

PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning (Stanford)

PrasadL1IntroIR2 Unstructured (text) vs. structured (database) data in 1996

PrasadL1IntroIR3 Unstructured (text) vs. structured (database) data in 2006

PrasadL1IntroIR4 Structured vs unstructured data Structured data : information in “tables” EmployeeManagerSalary SmithJones50000 ChangSmith IvySmith Typically allows numerical range and exact match (for text) queries, e.g., Salary < AND Manager = Smith.

PrasadL1IntroIR5 Unstructured data Typically refers to free text  Data which does not have clear, semantically overt, easy-for-a-computer structure Allows  Keyword-based queries including operators  More sophisticated “concept” queries, e.g., find all web pages dealing with drug abuse

PrasadL1IntroIR6 Semi-structured data In fact almost no data is “unstructured”  E.g., this slide has distinctly identified zones such as the Title and Bullets Facilitates “semi-structured” search such as  Title contains data AND Bullets contain search … to say nothing of linguistic structure

PrasadL1IntroIR7 What is IR? Representation Keywords/Phrases, Structure/Fonts, Counts, etc Organization and Storage Inverted File Index, Compressed, etc Hardware Architecture and Memory Hierarchy Access to information items Interface : Spell-checker to tree-structured display Visualization : Labeled Clusters, Timelines, Spring graphs, etc.

PrasadL1IntroIR8 Ultimate Focus of IR Satisfying user information need  Emphasis is on retrieval of information (not data) User information need : Examples  Printer reviews  Printer prices and availability  Words in which all vowels appear  Anagram/Permutations of art Predicting which documents are relevant, and then linearly ranking them.

PrasadL1IntroIR9 Information Need : Query, Relevancy An information need is the topic about which the user desires to know more, and is differentiated from a query, which is what the user conveys to the computer in an attempt to communicate the information need. A document is relevant if it is one that the user perceives as containing information of value with respect to their personal information need.

PrasadL1IntroIR10 DIKW Hierarchy Data: Symbolic units  E.g., Records of customer.  E.g., Bytes from sensors. Information : Data with an interpretation (Who?, What?, When?, Where?).  E.g., Records of current/new customer grouped by their ages.  E.g., Variation in temperature readings.

PrasadL1IntroIR11 DIKW Hierarchy Knowledge : Information organized with theoretical concepts or abstract ideas (How?)  E.g., How many customers have cancelled the accounts in current fiscal year?  E.g., Analysis of temperature variation over the years and their causes. Wisdom : Understanding of fundamental principles + Human Judgement  E.g., What strategies can be employed to retain customers in the face of cheaper alternatives?  E.g., Global warming issues and the future of Earth.

PrasadL1IntroIR12 Data Information Knowledge Wisdom Understanding Context Researching Absorbing Doing Interacting Reflecting Joining of wholes Formation of a whole Connection of parts Gathering of parts Past Future Experience Novelty DIKW hierarchy: Clark 2004

PrasadL1IntroIR13 You see things; and you say "Why?" But I dream things that never were; and I say "Why not?" George Bernard Shaw George Bernard Shaw

PrasadL1IntroIR14 Information vs Data Retrieval Unstructured : open to interpretation Usually incomplete or ambiguous (w.r.t information need) Partial match allowed, relevance-based ranking Probabilistic underpinnings Library Structured with well-defined semantics Well-defined semantics Exact match required - no or many results Foundations: Algebra/Logic Accounting DATA: QUERY : QUALITY OF RESULTS: FOUNDATIONS: APPLICATION:

PrasadL1IntroIR15 User Task  Retrieval Purposeful – HP Multifunction Printer Information  Browsing Casual – Big Bang, CBR, Element Genesis, Supernova,... Hyperlink-based  Filtering by Agents Push – Podcasts from B.B.C’s Naked Science Retrieval Browsing Database

PrasadL1IntroIR16 Logical View of Documents Abstraction (essentials)  Structure, fonts, proximity, repetitions, etc structure Accents spacing stopwords Noun groups stemming Manual indexing Docs structureFull textIndex terms

PrasadL1IntroIR17 User Interface Text Operations Query Operations Indexing Searching Ranking Index Text query user need user feedback ranked docs retrieved docs logical view inverted file DB Manager Module 4, 10 6, Text Database Text The Retrieval Process

PrasadL1IntroIR18 IR Basics Models and retrieval evaluation Query languages and operations Improve inferring query context –(query expansion, relevance feedback) Text operations Improve gleaning of document semantics –(stemming keywords) Efficient Access: Index and Search  Visualization, Multimedia, Applications, …

PrasadL1IntroIR19 Clustering and classification Given a set of docs, group them into clusters based on their content. Given a set of topics, plus a new doc D, decide which topic(s) D belongs to.

PrasadL1IntroIR20 The web and its challenges Unusual and diverse documents Unusual and diverse users, queries, information needs Beyond terms, exploit ideas from social networks  link analysis, clickstreams,... How do search engines work? And how can we make them better?

PrasadL1IntroIR21 More sophisticated semi- structured search Title is about Object Oriented Programming AND Author something like stro*rup  where * is the wild-card operator Issues:  how do you process “about”?  how do you rank results? The focus of XML search.

PrasadL1IntroIR22 More sophisticated information retrieval Cross-language information retrieval Question answering Summarization Text mining …

PrasadL1IntroIR23 Future Progress: Factors/Trends Large, uncontrolled publishing media  Quality issues Cheap, fast and wide access  Ease of use (query formulation) Variety and flexibility  Navigational and Visualization aids  Directory-based (Table of contents) vs Keywords- based (Inverted File Index) Index terms (automatic/human-created) vs Full-text Privacy, Security, Copyright