Download presentation
Presentation is loading. Please wait.
Published byStella Garrett Modified over 9 years ago
1
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
2
2 Course Text Modern Information Retrieval, R. Baeza-yates and B. Ribeiro-Neto., Addison-Wesley and ACM Press, 1999, ISBN: 0-201-39829-X
3
3 Introduction Example of information need in the context of the world wide web: “Find all documents containing information on computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies. To be relevant, the document must include information on admission requirements, and e-mail and phone number for contact purpose.” Information Retrieval
4
4 Information Retrieval Retrieval System Query documents Set of retrieved documents Documents Information Need Search Engine Useful or relevant information to the user Primary goal of an IR system “Retrieve all the documents which are relevant to a user query, while retrieving as few non-relevant documents as possible.” Representation, storage, organisation, and access to information items (Usually) keyword-based representation
5
5 User tasks Pull technology User requests information in an interactive manner 3 retrieval tasks – Browsing (hypertext) – Retrieval (classical IR systems) – Browsing and retrieval (modern digital libraries and web systems) Push technology – automatic and permanent pushing of information to user – software agents – example: news service – filtering (retrieval task) relevant information for later inspection by user
6
6 Documents Unit of retrieval A passage of free text – composed of text, strings of characters from an alphabet – composed of natural language newspaper article, a journal paper, a dictionary definition, email messages – size of documents arbitrary newspaper article vs. journal paper vs. email
7
7 What is a document?
8
8 Representation of documents Set of index terms or keywords –extracted directly form text –specified by human subjects (information science) metadata Most concise representation Poor quality of retrieval Full text representation –Most complete representation –High computational cost Large collections –Reduce set of representative keywords Elimination of stop words Stemming Identification of noun phrases Further compression Structure representation –Chapter, section, sub-section, etc Document term descriptors to access texts Generation of descriptors for text By hand By analysing the text
9
9 The retrieval process Information need Query Formulation Documents Document representation Indexing Retrieved documents Retrieval functions Relevance feedback
10
10 Queries Information Need Simple queries – composed of two or three, perhaps even dozens, of keywords – e.g., as in web retrieval Boolean queries – “neural networks AND speech recognition” Context Queries – Proximity search, phrase queries User term descriptors characterising the user need
11
11 Best-Match Retrieval Compare the terms in a document and query Compute similarity between each document in the collection and the query based on the terms that they have in common Sorting the documents in order of decreasing similarity with the query The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system Document term descriptors to access texts User term descriptors characterising the user need
12
12 Conceptual View of Text Retrieval Queries Documents Similarity Computation Retrieved Documents
13
13 Expanded view of text retrieval system QueriesDocuments Indexing Indexed Documents Similarity Computation Retrieved Documents Ranked Documents
14
14 Process of retrieving info User Interface Text Operations Query Operations Indexing Similarity Computation Ranking Document Repository Manager Index User need Logical view Inverted file Query Retrieved docs Text User feedback Ranked docs Text repository
15
15 Key Topics Indexing text documents Retrieving text documents Evaluation Query reformulations Search Engines = IR + Link Structure + Name Interpretation
16
16 Information Retrieval vs Information Extraction Information Retrieval –Given a set of query terms and a set of document terms select only the most relevant documents [precision], and preferably all the relevant [recall]. Information Extraction –Extract from the text what the document means. IR systems can FIND documents but need not “understand” them
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.