Modern Information Retrieval

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Advertisements

Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
An Introduction to Information Retrieval and Applications J. H. Wang Feb. 19, 2008.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFORMATION RETRIEVAL WEEK 1 AND 2
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Chapter 1: Introduction to IR.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
Search Engines and Information Retrieval Chapter 1.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Modern Information Retrieval Computer engineering department Fall 2005.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Search Engine Architecture
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
G. Marchionini, Univ. of Maryland Electronic Environments Cost Trends: Hardware cost < Software cost < Information cost < People time Virtuality (transcend.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Information Retrieval and Web Search Vasile Rus, PhD websearch/
Information Retrieval in Practice
Information Retrieval in Practice
Visual Information Retrieval
Search Engine Architecture
Information Retrieval (in Practice)
Text Based Information Retrieval
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
Multimedia Information Retrieval
Information Retrieval
Thanks to Bill Arms, Marti Hearst
Introduction into Knowledge and information
Evaluation of IR Performance
IL Step 3: Using Bibliographic Databases
CSE 635 Multimedia Information Retrieval
Ying Dai Faculty of software and information science,
Introduction to Information Retrieval
Lecture 8 Information Retrieval Introduction
Chapter 5: Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Extraction
Information Retrieval and Web Design
Recuperação de Informação
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Introduction to Search Engines
Presentation transcript:

Modern Information Retrieval Information Systems department Second Semester 2016-2017 Saif Rababah

Subjects 1. Introduction 2. Models 3. Retrieval evaluation 4. Query languages 5. Query reformulation 6. Text properties 7. Text languages 8. Text processing 9. Information retrieval from the Web 10. Multimedia IR 11. Multi-Lingual IR 12. P2P IR Saif Rababah

Sources 1. Modern Information RetrievalR. Baeza-Yates and B. Ribeiro-Neto; Addison-Wesley 2011. 2. Information Retrieval Algorithms and Heuristics . FRIEDER, OPHIR 2ND ED., 2004, XX, 332 P., SOFTCOVER , ISBN: 978-1-4020-3004-8. Various 3.Web sites. 4. Original material. Saif Rababah

Motivation IR: representation, storage, organization of, and access to information items Focus is on the user information need Information item: Usually text, but possibly also image, audio, video, etc. Presently, most retrieval of non-text items is based on searching their textual descriptions. Text items are often referred to as documents, and may be of different scope (book, article, paragraph, etc.). Saif Rababah

Motivation (cont.) User information need: Find all docs containing information on Computer Systems which: (1) are designed by IBM and (2) designed after 2004 Emphasis is on the retrieval of information (not data) Saif Rababah

Motivation (cont.) Data retrieval Information retrieval IR system: which docs contain a set of keywords? Well defined semantics a single erroneous object implies failure! Information retrieval information about a subject or topic semantics is frequently loose small errors are tolerated IR system: interpret contents of information items generate a ranking which reflects relevance notion of relevance is most important Saif Rababah

Motivation (cont.) IR at the center of the stage IR in the last 20 years: classification and categorization systems and languages user interfaces and visualization Still, area was seen as of narrow interest Advent of the Web changed this perception once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though: IR seen as key to finding the solutions! Saif Rababah

Objecttives 􀀹Overall objective: 􀀹Measurement of success: Minimize search overhead 􀀹Measurement of success: Precision and recall 􀀹Facilitate the overall objective: Good search tools Helpful presentation of results Saif Rababah

Minimize search overhead Minimize overheadof a user who is locating needed information. Overhead: Time spent in all steps leading to the reading of items containing the needed information (query generation, query execution, scanning results, reading non-relevant items, etc.). Needed information: Either Sufficient information in the system to complete a task. All information in the system relevant to the user needs. Example –shopping: Looking for an item to purchase. Looking for an item to purchase at minimal cost. Example –researching: Looking for a bibliographic citation that explains a particular term. Building a comprehensive bibliography on a particular subject. Saif Rababah

Measurement of success Two dual measures: 􀀹Precision: Proportion of items retrieved that are relevant. Precision = relevant retrieved / total retrieved = Answer  Relevant  / Answer  􀀹Recall: Proportion of relevant items that are retrieved. Recall = relevant retrieved / relevant exist = Answer  Relevant  /  Relevant  􀀹Most popular measures, but others exist. Saif Rababah

Measurement of success (cont.) Saif Rababah

Support user search Support user search, providing tools to overcome obstacles such as: Ambiguities inherent in languages. Homographs: Words with identical spelling but with multiple meanings.(عين) Example: Chinon—Japanese electronics, French chateau. Limits to user's ability to express needs. Lack of system experience or aptitude. Lack of expertise in the area being searched. Initially only vague concept of information sought. Differences between user's vocabulary and authors' vocabulary: different words with similar meanings. Saif Rababah

Presentation of results Present search results in format that helps user determine relevant items: Arbitrary (physical) order Relevance order Clustered (e.g., conceptual similarity) Graphical (visual) representation Saif Rababah

Basic Concepts The User Task Retrieval Browsing information or data purposeful Browsing glancing around F1; cars, Le Mans, France, tourism Retrieval Browsing Database Saif Rababah

Querying(retrieval) vs. Browsing Two complementary forms of information or data retrieval: Querying: Information need (retrieval goal) is focused and crystallized. Contents of repository are well-known. Often, user is sophisticated. Saif Rababah

Querying(retrieval) vs. Browsing (cont.) •Information need (retrieval goal) is vague and imprecise. •Contents of repository are not well-known. •Often, user is naive. Querying and browsing are often interleaved (in the same session). •Example: present a query to a search engine, browse in the results, restate the original query, etc. Saif Rababah

Pulling vs. Pushing information Querying and browsing are both initiated by users (information is “pulled” from the sources). Alternatively, information may be “pushed” to users. Dynamically compare newly received items against standing statements of interests of users (profiles) and deliver matching items to user mail files. Asynchronous (background) process. Profile defines all areas of interest (whereas an individual query focuses on specific question). Each item compared against many profiles (whereas each query is compared against many items). Saif Rababah

Basic Concepts Logical view of the documents Document representation viewed as a continuum: logical view of docs might shift Accents spacing Noun groups Manual indexing Docs stopwords stemming structure structure Full text Index terms Saif Rababah

The Retrieval Process Text User Interface user need Text Text Operations logical view logical view Query Operations DB Manager Module Indexing user feedback inverted file query Searching Index retrieved docs Text Database Ranking ranked docs Saif Rababah