LIS618 lecture 1 Thomas Krichel 2002-09-15. Organization homepage

Slides:



Advertisements
Similar presentations
LIS618 lecture 2 Thomas Krichel Structure Theory: information retrieval performance Practice: more advanced dialog.
Advertisements

Rclis in vision and reality Thomas Krichel
LIS618 lecture 3 Thomas Krichel Structure Theory: discussion of the Boolean model Theory: the vector model Practice: Nexis.
Open Archives and Open Libraries Thomas Krichel
LIS618 lecture 3 Thomas Krichel Structure Revision of what was done last week. Theory: discussion of the Boolean model Theory: the vector.
Use your bean. Count it. Thomas Krichel
My life and times Thomas Krichel LIU & НГУ
LIS618 lecture 1 Thomas Krichel Structure of talk Recap on Boolean Before online searching Working with DIALOG –Overview –Search command –Bluesheets.
LIS901N: webmastering I: the static web site Thomas Krichel
Information Retrieval (IR) on the Internet. Contents  Definition of IR  Performance Indicators of IR systems  Basics of an IR system  Some IR Techniques.
Traditional IR models Jian-Yun Nie.
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Modern Information Retrieval Chapter 1: Introduction
Data Science for Business: Semantic Verses Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
INFO 624 Week 3 Retrieval System Evaluation
introduction to MSc projects
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
LIS618 lecture 11 i/r performance evaluation Thomas Krichel
Chapter 5: Information Retrieval and Web Search
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
LIS618lecture 0 Introduction to the course Thomas Krichel
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License Creative Commons Attribution-NonCommercial-ShareAlike.
LIS654lecture 1 Introduction Thomas Krichel
LIS510 lecture 3 Thomas Krichel information storage & retrieval this area is now more know as information retrieval when I dealt with it I.
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
Selecting a Topic and Purpose
LIS618 lecture 0 Thomas Krichel today's lecture A look at the course home page administrative.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
LIS618 lecture 0 Thomas Krichel today's lecture I will not talk about the strike. A look at the course home page
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
LIS654 lecture 1 Introduction Thomas Krichel
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
LIS618 lecture 0 Thomas Krichel Organization homepage Contents to be discussed today. Send mail.
Introduction to LIS508 Thomas Krichel
Introduction to LIS508 Thomas Krichel
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
The Boolean Model Simple model based on set theory
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Developing Smart objectives and literature review Zia-Ul-Ain Sabiha.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Advanced information retrieval
Presentation transcript:

LIS618 lecture 1 Thomas Krichel

Organization homepage Contents to be discussed today. Send mail to –Your name –Your secret word for grades delivery Interrupt me with as many questions as possible! Ask for breaks!

Proposed Organization Normal lecture Quiz at the beginning of every lecture. Main quiz next week (25% of grade) Search exercise 55% Other quizzes 10% Formal syllabus to be made early next week!

Search exercise find victim conduct interview about an information need experienced by the victim, write down expectations search in Dialog and on web discuss results with the victim write essay, no longer than 7 pages.

Structure of talk First talk about me, then about you and the course General round trip on theoretical matters. –Context of database searching –Database searching and information retrieval –The retrieval process –Information retrieval models –Retrieval performance evaluation –Query languages Logging on to Dialog Web searching exercise (if time permits)

About me Born 1965, in Völklingen (Germany) Studied economics and social sciences at the Universities of Toulouse, Paris, Exeter and Leiceister. PhD in theoretical macroeconomics Lecturer in Economics at the University of Surrey 1993 and 2001 Since 2001 assistant professor at the Palmer School

Why? During research assistantship period, (1990 to 1993) I was constantly frustrated with difficult access to scientific literature. At the same time, I discovered easy access to freely downloadable software over the Internet. I decided to work towards downloadable scientific documents. This lead to my library career (eventually).

Steps taken I 1993 founded the NetEc project at later available at as well as at These are networking projects targeted to the economics community. The bulk is –Information about working papers –Downloadable working papers –Journal articles were added later

Steps taken II Set up RePEc, a digital library for economics research. Catalogs –Research documents –Collections of research documents –Researchers themselves –Organizations that are important to the research process Decentralized collection, model for the open archives initiative

Steps taken III Co-founder of Open Archives Initiative Work on the Academic Metadata Format Co-founded rclis, a RePEc clone for (Research in Computing, Library and Information Science)

summary There are three basic types of models in classic information retrieval. Extensions of these types are a matter of research concern and require good mathematical skills. All classic models treat document as individual pieces.

Database searching (DS) subset of the subject of information retrieval (IR) DS mainly thought as applicable to the set of large structured databases as opposed to do web searching for those, a general knowledge of what databases are seems useful Concentrate on textual databases

traditional social model user goes to a library describes problem to the librarian librarian does the search –without the user present –with the user present hands over the result to the user user fetches full-text or asks a librarian to fetch the full text.

economic rational for traditional model In olden days the cost of telecommunication was high. database use costs –cost of communication –cost of access time to the database the traditional model controls an upper bound on costs

disintermediation with access cost time gone, the traditional model is under threat there is disintermediation where the librarian looses her role but that may not be good news for information retrieval results –user knows subject matter best –librarian knows searching best

Web searching IR has received a lot of impetus through the web, which poses unprecedented search challenges. with more and more data appearing on the web DS may be a subject in decline, because it is primarily concerned with non- web databases

Main theory part Literature: "Modern Information Retrieval" by Ricardo Baeza-Yates and Berthier Ribiero-Neto Don't buy it. It is a not a good book.

before the IR process provider –define data that is available documents that can be used document operations document structure –index user –user need –IR system familiarity

the IR process query expresses user need in a query language processing of query yields retrieved documents calculation of relevance ranking examination of retrieved documents possible relevance cycle

main problem user is not an expert at the formulation of a query garbage in garbage out, the retrieval yields poor result ways out –design very intuitive interface –give expert guidance

key aid: index index term is a part of the document that has a meaning on its own (usually a noun) retrieval based on index term raises questions –semantics in query or document is lost –matching done in imprecise space of index terms predicting relevance is a central problem the IR model determines the process of relevance ranking

taxonomy of classic IR models Boolean, or set-theoretic –fuzzy set models –extended Boolean vector, or algebraic –generalized vector model –latent semantic indexing –neural network model probabilistic –inference network –belief network

basic concepts: index term an index term is a word whose semantics help to remember the document's main themes. nouns are mainly used if all words are index terms, the logical view of the document is full text

basic concept: weight of index term given all nouns, not all appear to have the same relevance to the text sometimes, we can have a simple measure of the importance of a term, example? more generally, for each indexing term and each document we can associate a weight with the term and the document. usually, if the document does not contain the term, its weight is zero

basic concept: mutual term independence Thinking of the weight of a term as a function of the document and the term only implies that it is independent of other terms. This is an important oversimplification. But it allows for fast computation. No study has shown that not assuming independence brings significant performance gain.

Boolean model in the Boolean model, the index weight of all index term for any document is 1 if the term appears in the document. It is 0 otherwise. This allows to combine query terms with Boolean operator AND, OR, and NOT thus powerful queries can be written

example: a AND (b OR NOT c) a b c a b a c c b c b a

advantages of Boolean model supposedly easy to grasp by the user precise semantics of queries implemented in the majority of commercial systems why is it set-theoretic ?

problems of Boolean model sharp distinction between relevant and irrelevant documents no ranking possible users find it difficult to formulate Boolean queries

Thank you for your attention!