Download presentation
Presentation is loading. Please wait.
1
Information Retrieval
Lebanese University Faculty of Economics and Business Administration – 1st Branch Class: M1 Instructor: Dr. Lina A. Nimri
2
Course Text Book Modern Information Retrieval,
R. Baeza-yates and B. Ribeiro-Neto., Addison-Wesley and ACM Press, 1999, ISBN: X
3
Introduction Modern Information Retrieval, Chapter 1
Ricardo Baeza-Yates, Berthier Ribeiro-Neto
4
Introduction (1) are maintained by a USA university and
Examples of information need in the context of the world wide web: “Find all documents containing information on computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies, To be relevant, the document must include information on admission requirements, and and phone number for contact purpose.” “Find all docs containing information on college tennis teams which: (1) are maintained by a USA university and (2) participate in the NCAA tournament. Information Retrieval
5
Information Retrieval
Representation, storage, organisation, and access to information items (Usually) keyword-based representation User Information Need Query Documents Useful or relevant information to the user Set of retrieved documents Search Engine Retrieval System Primary goal of an IR system “Retrieve all the documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.”
6
Data Retrieval Determine which documents contain the keywords in the user query is not always enough to satisfy the user information need. Data Retrieval retrieves objects which satisfy clearly defined conditions, such as regular expressions or relational algebra expressions. Data Retrieval system deals with data with well-defined structure and semantics
7
Information Retrieval System
Retrieving information about a subject Deals with natural language text which is not well structured and could be semantically ambiguous It must interpret the contents of documents and rank them according to the degree of relevance to the user need.
8
Area of interest Digital Libraries Information experts
World Wide Web - Very difficult task The hyperspace is vast The absence of a well defined data model (format or representation form)
9
Effective retrieval The effective retrieval of relevant information is directly affected by: The user task The logical view of the document (document’s representation) adopted by the retrieval system.
10
User tasks Pull technology Push technology
User requests information in an interactive manner 3 retrieval tasks Browsing (hypertext) Retrieval (classical IR systems) Browsing and retrieval (modern digital libraries and web systems) Push technology automatic and permanent pushing of information to user software agents example: news service filtering (retrieval task) relevant information for later inspection by user
11
Pulling The user can browse the documents when his main objectives are not clear in the beginning and whose purpose might change during the interaction with the system. Combination of retrieval and browsing is not yet a well established approach. Retrieval Browsing Database
12
Documents Unit of retrieval A passage of free text
composed of text, strings of characters from an alphabet composed of natural language newspaper article, a journal paper, a dictionary definition, messages size of documents arbitrary newspaper article vs. journal paper vs.
13
What is a document?
14
Representation of documents
Documents are represented thru a set of index terms or keywords or term descriptors extracted directly form text specified by human subjects (information science) metadata Most concise representation Poor quality of retrieval Full text representation Most complete representation High computational cost Large collections Reduce set of representative keywords Elimination of stop words Stemming Identification of noun phrases Further compression Document term descriptors to access texts Generation of descriptors for text By hand By analysing the text
15
Logical View of the documents
Accents spacing Noun groups Manual indexing Docs stopwords stemming structure structure Full text Index terms
16
The retrieval functions
Information need Documents Formulation Indexing Query Document representation Relevance feedback Retrieval functions Retrieved documents
17
Queries Information Need: Simple queries Boolean queries
composed of two or three, perhaps even dozens, of keywords e.g., as in web retrieval Boolean queries “neural networks AND speech recognition” Context Queries Proximity search, phrase queries User term descriptors characterising the user need
18
Best-Match retrieval Document term descriptors to access texts
Compare the terms in a document and query Compute similarity between each document in the collection and the query based on the terms that they have in common Sorting the documents in order of decreasing similarity with the query The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system User term descriptors characterising the user need
19
Conceptual view of text retrieval system
Queries Similarity Computation Documents Retrieved Documents
20
Expanded view of text retrieval system
Queries Indexed Documents Documents Indexing Similarity Computation Retrieved Documents Ranked Documents
21
Process of retrieving info
User Interface Text User feedback User need Text Text Operations Logical view Logical view Document Repository Manager Query Operations Indexing Inverted file Query Indexing languages – for representing queries and representing documents Similarity algorithms Similarity Computation (Searching) Index Text repository Retrieved docs Ranked docs Ranking
22
IR + Link Structure + Name Interpretation
Key Topics Indexing text documents Retrieving text documents Evaluation Query reformulations Search Engines = IR + Link Structure + Name Interpretation
23
Information Retrieval vs Information Extraction
Given a set of query terms and a set of document terms select only the most relevant documents [precision], and preferably all the relevant [recall]. Information Extraction Extract from the text what the document means. IR systems can FIND documents but need not “understand” them
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.