Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement.

Similar presentations


Presentation on theme: "© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement."— Presentation transcript:

1

2 © Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement Feb. 3, 2003

3 © Tefko Saracevic, Rutgers University2 Information retrieval (IR): traditional model Definition of IR System & user components Exact match & best match searches Strengths & weaknesses of the two match models

4 © Tefko Saracevic, Rutgers University3 IR: problems addressed - original definition Calvin Mooers first introduced this term, “information retrieval”, into the literature of documentation in 1950. (Swanson, 1988) “Inf. retrieval embraces the intellectual aspects of the description of information and its specification for search, and also whatever systems, techniques, or machines are employed to carry out the operation.” Calvin Mooers, 1951

5 © Tefko Saracevic, Rutgers University4 IR: another definition “Information retrieval is often regarded as being synonymous with document retrieval and nowadays, with text retrieval, implying that the task of an IR system is to retrieve documents or texts with information content that is relevant to a user’s information need” (Spark Jones & Willett, 1997)

6 © Tefko Saracevic, Rutgers University5 IR: Objective & problems Provide the users with effective access to & interaction with information resources. Problems addressed: 1. How to organize information intellectually? 2. How to specify search & interaction intellectually? 3. What systems & techniques to use for those processes?

7 © Tefko Saracevic, Rutgers University6 IR models Model depicts, represents what is involved - a choice of features, processes, things for consideration Several IR models used over time –traditional: oldest, most used, shows basic elements involved –interactive: more realistic, favored now, shows also interactions involved; several models proposed Each has strengths, weaknesses We start with traditional model to illustrate many points - from general to specific examples

8 © Tefko Saracevic, Rutgers University7 Traditional IR model The classic information retrieval model (Bates, 1989) Document representation Query Information need Match

9 © Tefko Saracevic, Rutgers University8 Traditional IR model The “standard” IR model (Belkin, 1993) Information need Representation Query Texts Representation Surrogate Comparison Retrieval Texts Judgment Modification

10 © Tefko Saracevic, Rutgers University9 File organization indexed documents Acquisition documents, objects Representation indexing,... Problem information need Representation question Query search formulation Matching searching Retrieved objects feedback Traditional IR model SystemUser

11 © Tefko Saracevic, Rutgers University10 A few question about the traditional models 1. What is the similarity and difference between these three models? 2. What do you learn about IR from them? 3. What is the weaknesses and strengths of traditional IR model? If possible, critique these models combining your own experience.

12 © Tefko Saracevic, Rutgers University11 Content: What is in databases –In DIALOG first part of blue sheets: File Description, Subject Coverage Selection of documents & other objects from various sources –In blue sheets: Sources Mostly text based documents –Full texts, titles, abstracts... –But also: data, statistics, images (e.g. maps, trade marks)... Acquisition (system) Importance: Determines contents of databases Key to file selection !!!

13 © Tefko Saracevic, Rutgers University12 Indexing : –controlled vocabulary - thesaurus –free text terms (even in full texts) Abstracting; annotating Bibliographic description: –author, title, source, date…metadata Classifying, clustering, ranking –Basic Index, Additional Index. Limits Organization in fields & limits Manual & automatic techniques –advantages & disadvantages Representation of documents, objects (system) Basic to what is available for searching & displaying

14 © Tefko Saracevic, Rutgers University13 Sequential – record (document) by record Inverted –term by term; list of records under each term Combination: indexes inverted, documents sequential When citation retrieved only, need for document files Large file approaches – for efficient retrieval by computers File organization (system) Enables searching & interplay

15 © Tefko Saracevic, Rutgers University14 Related to task situation at hand Vary in specificity, clarity Produces information need Ultimate criterion for effectiveness of retrieval Inf. need for the same problem may change, evolve, shift during the IR process - adjustment in searching Often more than one search for same problem over time Problem (user) Critical for examination in interview

16 © Tefko Saracevic, Rutgers University15 A question: Why information need for the same problem may change? Do you have this experience? Tell us your story. Problem (user)

17 © Tefko Saracevic, Rutgers University16 Non-mediated: end user alone Mediated: intermediary + user –interviews; human-human interaction Question analysis: selection, elaboration of terms Focus toward search terms & logic; selection of databases Subject to feedback changes Various tools: thesaurus... Roles of intermediary Representation - question ( user & possibly system) Determines contents of searching - dynamic

18 © Tefko Saracevic, Rutgers University17 Translation into systems requirements & limits –start of human-computer interaction Selection of databases Search strategy - selection of: –search terms & logic –possible fields, delimiters –controlled & uncontrolled vocabulary –variations in effectiveness tactics Reiterations from feedback –several feedback types: relevance feedback, magnitude feedback... –query expansion & modification Query - search statement (user & system) What & how of actual searching

19 © Tefko Saracevic, Rutgers University18 Process of matching, comparing –search: what documents in the file match the query as stated? Various search algorithms: –exact match - Boolean still most prevalent –best match - ranking by relevance increasingly used e.g. on the web –hybrids incorporating both e.g. Target, Rank in DIALOG Each has strengths, weaknesses –no ‘perfect’ method exists Matching - searching (user & system) Search interactions

20 © Tefko Saracevic, Rutgers University19 Various order of output: –Last In First Out (LIFO); sorted –ranked by relevance –ranked by other characteristics Various forms of output –In DIALOG: Output options When citations only: linkage to document delivery Base for relevance, utility evaluation by users Relevance feedback Retrieved documents (from system to user) What a user sees, gets, judges

21 © Tefko Saracevic, Rutgers University20 Exact match - Boolean search You retrieve exactly what you ask for in the query: –all documents that have the term(s) with logical connection(s), and possible other restrictions (e.g. to be in titles) as stated in the query –exactly: nothing less, nothing more Based on matching following rules of Boolean algebra, or algebra of sets –‘new algebra’ –presented by circles in Venn diagrams

22 © Tefko Saracevic, Rutgers University21 Boolean algebra & Venn diagrams Four basic operations: 123 AB A alone. All documents that have A. Shade 1 & 2. E.G. apples 123 AB A AND B. Shade 2 apples AND oranges 123 AB A OR B. Shade 1, 2, 3 apples OR oranges 123 AB A NOT B. Shade 1 apples NOT oranges

23 © Tefko Saracevic, Rutgers University22 Venn diagrams … cont. Complex statements allowed e.g 1 2 3 4 5 6 7 A B C (A OR B) AND C Shade 4,5,6 (apples or oranges) AND Florida (A OR B) NOT C Shade what? (apples or oranges NOT Florida

24 © Tefko Saracevic, Rutgers University23 Venn diagrams cont. Complex statements can be made –as in ordinary algebra e.g. (2+3)x4 As in ordinary algebra: watch for parenthesis: –2+(3 x 4) is not the same as (2+3)x4 –(A AND B) OR C not the same as A AND (B OR C)

25 © Tefko Saracevic, Rutgers University24 Best match searching You retrieve documents ranked by how similar (close) they are to a query (as calculated by the system) –similarity assumed as relevance –thus, documents as answers are presented from those that are most likely relevant downwards to less & less likely relevant - can be cut at any desired number - e.g. first 10 Algorithms (formulas) used to determine similarity –using statistic &/or linguistic properties Web outputs are mostly ranked But DIALOG allows ranking as well, with special commands

26 © Tefko Saracevic, Rutgers University25 Best match... cont. Best match process: –compares a set of query terms with the sets of terms in documents –calculates a similarity between query & each document based on common terms –sorts the documents in order of similarity –assumes that the higher ranked documents have a higher probability of being relevant –allows for cut-off at a chosen number BIG issue: What representation & similarity measures are best? –considerable research & many tests –many proprietary algorithms

27 © Tefko Saracevic, Rutgers University26 Boolean vs. best match Boolean –allows for logic –provides all that has been matched BUT –has no particular order of output –treats all retrievals equally - from the most to least relevant ones –often requires examination of large outputs Best match –allows for free terminology –provides for a ranked output –provides for cut- off - any size output BUT –does not include logic –ranking method (algorithm) not transparent whose relevance? –where to cut off?

28 © Tefko Saracevic, Rutgers University27 Boolean vs. best match Questions about best match (just thinking). 1. If you are a user, do you believe the judgment of algorithm if you do not read the hits? 2. Is it definitely that a document which is judged only 10% relevant to your query is less useful for resolving your information problem than a 40% relevant one?

29 © Tefko Saracevic, Rutgers University28 Strengths of traditional IR model Lists major components in both system & user branches Suggests: –What to explain to users about system, if needed –What to ask of users for more effective searching (problem...) Selection of component(s) for concentration –mostly ever better representation Provides a framework for evaluation of (static) aspects

30 © Tefko Saracevic, Rutgers University29 Weaknesses Does not address nor account for interaction & judgment of results by users –identifies interaction with search only –interaction is a much richer process Many types of & variables in interaction not reflected Feedback has many types & functions - also not shown Evaluation thus one-sided IR is a highly interactive process - thus additional model(s) needed


Download ppt "© Tefko Saracevic, Rutgers University1 1.Discussion 2.Information retrieval (IR) model (the traditional models). 3. The review of the readings. Announcement."

Similar presentations


Ads by Google