Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction
2 Motivation First for libraries, but now WWW!!! Info: representation, storage, organization, access Search Engines (IR systems) User information need oPlain English description query Concerns of modern IR: omodeling oclassification, categorization, filtering osystem architecture ouser interfaces, visualization, query languages
3 Data vs. Information Retrieval Data Retrieval Precise description Well-structured data Precise results Yes-or-no results Science Information Retrieval Vague information need Natural Language, images,... Semantic interpretation Approximate results Relevance ranking Art!
4 Basic Concepts User task (search) oCan formulate what they need: Retrieval (classical) oCant (or does not know): Browsing (new to IR) Still not very well integrated oFiltering (user passive, contents active) Logical view of docs o... Added linguistic info... not clear if helps oFull text oText operations: reduce complexity to index terms Keywords, stopwords Stemming, noun groups (linguistic processing needed) oCategories Slow, good Fast, bad
5 Past, Present, and Future Since clay tablets oAlphabetical index (formal) oTable of Contents (by storing order) oClassifications (by meaning) Libraries oAutomation of classical techniques. Catalogs. oSearch by fields (exact match: author, title, keywords) Web & Digital Libraries: interactive oCheaper huge amount of data oNetworks remote access, wider audience oFree publishing unprepared, heterogeneous data Artificial Intelligence and Linguistic methods
6 Main concerns Open audience oHelp people to formulate their information need oImprove retrieval quality. Intelligent methods Efficiency (speed) oDevelopment of fast techniques Interaction oWatch user behavior to improve quality oPrivacy! Open content oLegal issues. Copyright. Responsibility for info quality oIntelligent methods
7 Retrieval process Database oDefine the logical view: text operations, text model Index (e.g., inverted file) User query oQuery operations (users are not good at this!) Retrieved docs oRanked by likelihood (relevance) Feedback cycle
9 The Textbook: Text IR Models and Evaluation oModeling (basic concepts) oRetrieval Evaluation Improvements on Retrieval oQuery Languages oQuery Operations oText Languages and Properties oText Operations Efficiency oIndexing and Searching
10 Conferences & Journals Confs on IR oIR oACM SIGIR oTREC oSPIRE Journal oIR General conferences on text processing oACL oCOLING oCICLing oDEXA (databases) oNLDB
11 Conclusions User Information Need oVague oSemantic, not formal Document Relevance oOrder, not retrieve Huge amount of information oEfficiency concerns oTradeoffs IR is art more than science
