Presentation is loading. Please wait.

Presentation is loading. Please wait.

To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information.

Similar presentations


Presentation on theme: "To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information."— Presentation transcript:

1 To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information Retrieval March 23, 2004

2  Problems of traditional IR systems  Visual information retrieval system I -- GUIDO  Underlying model  How it works  Problems and development  Visual information retrieval system II -- VIBE  Underlying model  How it works  Problems and development  Conclusions Overview

3 Problems of Traditional IR Systems In responding to a user’s query, the traditional IR systems return with sequential list of documents:  The documents may or may not be ranked by relevance.  Even if ranked, users do not know how the relevance is determined by the systems.  Preventing users from viewing of other documents, and with no or little relevance feedback, they hinder users’ query reformation process.  The sequential list does not give users a clear and comprehensive view of all the documents retrieved.  Little or no efforts to take into account user individuality.

4 The Questions Raised  How to present to users a complete view of the document space so that  Users are not limited away from the documents the systems deem less relevant;  Users are able to see significant relationships among the documents.  How to enable users to browse and navigate large document spaces with ease ?  How to facilitate users’ query reformation process?

5 GUIDO: Graphical User Interface for Data Organization The underlying model--the vector space model  A document collection is viewed as a multidimentional space whose dimensionality is determined by the number of vocabulary terms.  The vectors of the queries are reference points called points of interests (POI), against which each document is measured.  A Document is represented by a point whose coordinates are its absolute distances from each POI.  Similarity of a document to a POI is measured by the absolute distance (numerical values) between the two vectors.

6 GUIDO: 1, 2, and 3 POIs  With a single POI (query), GUIDO display in one-dimentioal just like many traditional IR systems.  With 2 POIs, the distance space (document space) is a half- infinite plank.  With 3 POIs, the distance space is a 3-dimentional prism.  GUIDO is most useful for 2 and 3 POIs or reference points. Fig. 2 The GUIDO space with 3 POIs Fig. 1 The GUIDO space with 2 POIs

7 GUIDO: A 2-POI example  Document icons located lower in the plank have higher similarity to both POIs than those further out.  All documents having the same distances from both POIs appear at the same single position. Fig. 3 The GUIDO space with 2 POIs, with distances measured using 3 different metrics (the city block metric; the Euclidean metric, and the maximal distance metric.)

8 GUIDO: Browsing and Retrieval  Four models of combining distances for retrieval evaluation have been proposed.  Once a model is chosen, a cap is formed. Documents below the cap will be retrieved.  Placing the cap near the bottom of the plank sets a higher threshold for retrieval. Fig. 4 The GUIDO space with 2 POIs, capped with four different retrieval models (disjunctive, conjuctive, elliptical and Cassini oval).

9 GUIDO: Redefining the Display  Documents unrelated to the POIs are hidden.  By changing POIs, new document spaces can be generated. (query reformulation)  Four ways of creating new POIs:  Modify the weights of the terms in an existing POI  Combine given POIs into a new one  Select a particular document to be a POI  Calculate a new POI from an interesting cluster of the documents by a simple cluster analysis.

10 GUIDO: Problems and Future Directions  Loss of information in changing from document space model to the distance space model.  Document space model: only documents with identical descriptors mapped into the same point.  Distance space model: any documents whose distances to various POIs are same mapped into the same point.  Inter-document distance is not represented.  Integrate GUIDO with other visual interfaces (such as BIRD which is based on the Boolean model) to create a more complete document handling system.

11 VIBE: VIsualization By Example  Developed by Olsen et al. (1991) on the basis of the same distance paradigm as GUIDO, with further reduction of dimensionality.  A document collection is viewed as a TWO dimensional space.  The POIs can be positioned anywhere (NOT fixed as in GUIDO), and users can define as many POIs as wanted.  A Document is represented by a point whose coordinates are the RATIOS of its distances from each POI (NOT the absolute distances as used in GUIDO).

12 VIBE: An Example

13 GUIDO Vs. VIBE  POIs are fixed, 2-3 are the most useful.  POIs can be modified.  Document position determined by the absolute distances to each POI.  Position of a document indicates the strength of each POI with respect to the document.  Loss of information as different documents with the same distances to each POIs are mapped to the same point.  POIs are NOT fixed, 3 or more are the most useful.  POIs can be modified AND moved.  Document position determined by the ratios of the distances to each POI.  Position of a document indicates the relative strength of each POI with respect to the document.  Icon size proportional to the strength of the most significant POI.  Further Loss of information as different documents with the same ratios of the distances to each POIs are mapped to the same point.

14 VIBE: Recent development

15 Conclusions  Both GUIDO and VIBE provide users with full visibility of a document set of large size and allow them to control the retrieval dynamically.  Both GUIDO and VIBE permit users to modify POIs and view the instant impact of document space by such modifications, therefore greatly facilitate query reformulation process.  Unlike the traditional IR systems, GUIDO and VIBE enable users to browse the database and determine the relevance of a document before its retrieval.  Limitations in displays will need to be resolved before GUIDO and VIBE can go mainstream.

16 Additional Readings  Korfhage RR and Olsen KA, 1991. Information display: control of visual representations. IEEE.  Nuchprayoon A and Korfhage RR, 1994. GUIDO, a visual tool for retrieving documents. IEEE.  Korfhage RR and Nuchprayoon A, 1997. GUIDO: visualizing document retrieval. IEEE.  Morse E, Lewis M, and Olsen KA. Testing Visual Information Retrieval Methodologies Case Study: Comparative Analysis Of Textual, Icon, Graphical And “Spring” Displays. Manuscript on the web.

17


Download ppt "To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information."

Similar presentations


Ads by Google