Slide 1 What is this course about? Processing Indexing Retrieving … textual data Fits in four lines, but much more complex and interesting than that
Slide 2 Need for IR With the advance of WWW - more than 3 Billion documents indexed on Google Various needs for information: –Search for documents that fall in a given topic –Search for a specific information –Search an answer to a question –Search for information in a different language
Slide 3 Some definitions of Information Retrieval (IR) Salton (1989): “Information-retrieval systems process files of records and requests for information, and identify and retrieve from the files certain records in response to the information requests. The retrieval of particular records depends on the similarity between the records and the queries, which in turn is measured by comparing the values of certain attributes to records and information requests.” Kowalski (1997): “An Information Retrieval System is a system that is capable of storage, retrieval, and maintenance of information. Information in this context can be composed of text (including numeric and date data), images, audio, video, and other multi-media objects).”
Slide 4 Examples of IR systems Conventional (library catalog) Search by keyword, title, author, etc. E.g. : You are probably familiar with Text-based (Lexis-Nexis, Google, FAST). Search by keywords. Limited search using queries in natural language. Multimedia (QBIC, WebSeek, SaFe) Search by visual appearance (shapes, colors,… ). Question answering systems (AskJeeves, Answerbus) Search in (restricted) natural language Other: cross language information retrieval, music retrieval
Slide 7 IR systems on the Web Search for Web pages Search for images Search for image content Search for answers to questions Search for music?
Class meets TTh, 2:00-3:20pm
Slide 9 Course resources Textbook: –Modern Information Retrieval Ricardo Baeza-Yates and Berthier Ribeiro-Neto Recommended: –Readings in Information Retrieval K.Sparck Jones and P. Willett –See the class website for pointers to places to buy them for less Papers from conferences, journals will be assigned throughout the course. Whenever possible, a copy of the paper will be placed on the class website.
Students are free to choose the programming language they want to work with However: –I recommend working with Perl –We'll have a short Perl tutorial next 1-2 lectures –Why Perl? Makes life much much more easier for text processing problems and for Web based applications Information Retrieval involves a lot of text processing, and often involves Web access –Code reusability
Slide 12 Tentative schedule Course Overview Short Perl Tutorial Introduction to IR models and methods Text analysis / document preprocessing Vectorial model Boolean model Probabilistic model; other IR models IR collections IR evaluation Query operations Query languages Natural Language IR (Named Entity recognition)
Slide 13 Tentative schedule Natural Language IR (Semantic ambiguity, conceptual indexing) Natural Language IR (Phrase indexing, other) Question Answering: TREC / Web Information extraction Text classification/Topic tracking and detection Web IR: crawlers Web IR: search engines Web IR: link based / content based Web IR: evaluation metrics / Midterm review Special topics: Cross Language IR Special topics Final IR overview, future directions …. Midterm I, Midterm II, Project presentations