LIS618 lecture 1 Thomas Krichel 2003-01-29. economic rational for traditional model In olden days the cost of telecommunication was high. database use.

Slides:



Advertisements
Similar presentations
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Advertisements

LIS618 lecture 1 Thomas Krichel Organization homepage
LIS618 lecture 6 Thomas Krichel structure DIALOG –basic vs additional index –initial database file selection (files) Lexis/Nexis.
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Modern Information Retrieval Chapter 1: Introduction
LIS618 lecture 9 Thomas Krichel Structure Google “theory”, see essay by Brin and Page fullpapers/1921/com1921.htm.
LIS618 lecture 9 Web retrieval Thomas Krichel
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Advance Information Retrieval Topics Hassan Bashiri.
Vector Space Model CS 652 Information Extraction and Integration.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Information Retrieval
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Chapter 5: Information Retrieval and Web Search
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
LIS618lecture 0 Introduction to the course Thomas Krichel
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
LIS618 lecture 0 Thomas Krichel today's lecture A look at the course home page administrative.
LIS618 lecture 2 the Boolean model Thomas Krichel
Week 9 Search Engines and the Invisible Web. Resource Pages Collections of Links Compiled by “experts” Sometimes annotated Targeted Information for a.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
LIS618 lecture 4 Thomas Krichel Structure of talk Before online searching Introduction to online searching Introduction to DIALOG –Overview.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
The Internet 8th Edition Tutorial 4 Searching the Web.
Chapter 6: Information Retrieval and Web Search
LIS618 lecture 0 Thomas Krichel today's lecture I will not talk about the strike. A look at the course home page
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
LIS618 lecture 0 Thomas Krichel Organization homepage Contents to be discussed today. Send mail.
Structure of IR Systems INST 734 Module 1 Doug Oard.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Performance Measurement. 2 Testing Environment.
Information Retrieval
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
LIS618 lecture 8 Thomas Krichel Lexis/Nexis Lexis is a specialized legal research service Nexis is primarily a news services adds an important.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
Lecture 1: Introduction and the Boolean Model Information Retrieval
Text Based Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Design
Presentation transcript:

LIS618 lecture 1 Thomas Krichel

economic rational for traditional model In olden days the cost of telecommunication was high. database use costs –cost of communication –cost of access time to the database the traditional model controls an upper bound on costs

disintermediation with access cost time gone, the traditional model is under threat there is disintermediation where the librarian looses her role but that may not be good news for information retrieval results –user knows subject matter best –librarian knows searching best

Web searching IR has received a lot of impetus through the web, which poses unprecedented search challenges. with more and more data appearing on the web DS may be a subject in decline –it is primarily concerned with non-web databases –There is more and more web-based methods of searching

Quote to think about “Clearly, intermediated searching has past its prime. No longer does a search require a searcher—at least not a professional one.” By Barbara Quint, in “Quint’s Online”, a regular column she contributes for Information Today. It appeared in the issue.

Quint’s points It was forseeable in 1992 that the public at large would be able to do online searching. At the same time need for quality answers has grown. –Google ask-a service Quality-filtered services will become more important.

Quint’s points From the requirement for vetted sources, it does not follow that formal publishing and databases will flourish. The current offerings will have to change. In the current databases, there is as lot that would already be available for free mixed with quality-controlled stuff. –Item based pricing implies same price for all items –Subscription-based pricing still means that the user has to make quality judgment

Quint’s points A good business news service would –Extracts reporting form company sites –Gets external reviews on the reports –Post statistical counts about the company –Offer to get users and authors in touch so that authors could do private research for a user. Publishers have direct offerings and intermediated vending is in decline. –Example: CSA pulls out of DIALOG

Main theory part Literature: "Modern Information Retrieval" by Ricardo Baeza-Yates and Berthier Ribiero-Neto Don't buy it. It is a not a good book.

before the IR process provider –define data that is available documents that can be used document operations document structure –index user –user need –IR system familiarity

the IR process query expresses user need in a query language processing of query yields retrieved documents calculation of relevance ranking examination of retrieved documents possible relevance cycle

main problem user is not an expert at the formulation of a query garbage in garbage out, the retrieval yields poor result ways out –design very intuitive interface for the query –give expert guidance

taxonomy of classic IR models Boolean, or set-theoretic –fuzzy set models –extended Boolean vector, or algebraic –generalized vector model –latent semantic indexing –neural network model probabilistic –inference network –belief network

summary There are three basic types of models in classic information retrieval. Extensions of these types are a matter of research concern and require good mathematical skills. All classic models treat document as individual pieces.

key aid: index an index is a list of terms, with a list of locations where the term is to be found. The way to express locations usually depends on the form that the indexed data takes. –for a book, it is usually the page number, e.g. "shmoo 34, 75" –for computer files it is usually the name of the file plus the number of the byte where the indexed term starts, e.g. "krichel index.html 34, cv.html " there is usually more than one location of the term.

key aid: index terms index term is a part of the document that has a meaning on its own. it is usually a noun word. retrieval based on index term raises questions –semantics in query or document is lost –matching done in imprecise space of index terms predicting relevance is a central problem the IR model determines the process of relevance ranking

basic concept: weight of index term given all nouns, not all appear to have the same relevance to the text sometimes, we can have a simple measure of the importance of a term, example? more generally, for each indexing term and each document we can associate a weight with the term and the document. usually, if the document does not contain the term, its weight is zero

Boolean model in the Boolean model, the index weight of all index term for any document is 1 if the term appears in the document. It is 0 otherwise. This allows to combine query terms with Boolean operator AND, OR, and NOT thus powerful queries can be written

Thank you for your attention!