SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002.

SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002

Outline Why Text is ToughWhy Text is Tough Visualizing Concept SpacesVisualizing Concept Spaces –Clusters –Category Hierarchies Visualizing Query SpecificationsVisualizing Query Specifications Visualizing Retrieval ResultsVisualizing Retrieval Results Usability Study Meta-AnalysisUsability Study Meta-Analysis

Why Visualize Text? To help with Information RetrievalTo help with Information Retrieval –give an overview of a collection –show user what aspects of their interests are present in a collection –help user understand why documents retrieved as a result of a query Text Data MiningText Data Mining –Mainly clustering & nodes-and-links Software EngineeringSoftware Engineering –not really text, but has some similar properties

Why Text is Tough Text is not pre-attentiveText is not pre-attentive Text consists of abstract conceptsText consists of abstract concepts –which are difficult to visualize Text represents similar concepts in many different waysText represents similar concepts in many different ways –space ship, flying saucer, UFO, figment of imagination Text has very high dimensionalityText has very high dimensionality –Tens or hundreds of thousands of features –Many subsets can be combined together

Why Text is Tough The Dog.

Why Text is Tough The Dog. The dog cavorts. The dog cavorted.

Why Text is Tough The man. The man walks.

Why Text is Tough The man walks the cavorting dog. So far, we can sort of show this in pictures.

Why Text is Tough As the man walks the cavorting dog, thoughts arrive unbidden of the previous spring, so unlike this one, in which walking was marching and dogs were baleful sentinals outside unjust halls. How do we visualize this?

Why Text is Tough Abstract concepts are difficult to visualizeAbstract concepts are difficult to visualize Combinations of abstract concepts are even more difficult to visualizeCombinations of abstract concepts are even more difficult to visualize –time –shades of meaning –social and psychological concepts –causal relationships

Why Text is Tough Language only hints at meaningLanguage only hints at meaning Most meaning of text lies within our minds and common understandingMost meaning of text lies within our minds and common understanding –“How much is that doggy in the window?” how much: social system of barter and trade (not the size of the dog) “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own “in the window” implies behind a store window, not really inside a window, requires notion of window shopping

Why Text is Tough General categories have no standard ordering (nominal data)General categories have no standard ordering (nominal data) Categorization of documents by single topics misses important distinctionsCategorization of documents by single topics misses important distinctions Consider an article aboutConsider an article about –NAFTA –The effects of NAFTA on truck manufacture –The effects of NAFTA on productivity of truck manufacture in the neighboring cities of El Paso and Juarez

Why Text is Tough Other issues about languageOther issues about language –ambiguous (many different meanings for the same words and phrases) –different combinations imply different meanings

Why Text is Tough I saw Pathfinder on Mars with a telescope.I saw Pathfinder on Mars with a telescope. Pathfinder photographed Mars.Pathfinder photographed Mars. The Pathfinder photograph mars our perception of a lifeless planet.The Pathfinder photograph mars our perception of a lifeless planet. The Pathfinder photograph from Ford has arrived.The Pathfinder photograph from Ford has arrived. The Pathfinder forded the river without marring its paint job.The Pathfinder forded the river without marring its paint job.

Why Text is Easy Text is highly redundantText is highly redundant –When you have lots of it –Pretty much any simple technique can pull out phrases that seem to characterize a document Instant summary:Instant summary: –Extract the most frequent words from a text –Remove the most common English words

Guess the Text 478 said 233 god 201 father 187 land 181 jacob 160 son 157 joseph 134 abraham 121 earth 119 man 118 behold 113 years 104 wife 101 name 94 pharaoh

Text Collection Overviews How can we show an overview of the contents of a text collection?How can we show an overview of the contents of a text collection? –Show info external to the docs e.g., date, author, source, number of inlinks does not show what they are about –Show the meanings or topics in the docs a list of titles results of clustering words or documents organize according to categories (next time)

Clustering for Collection Overviews –Scatter/Gather show main themes as groups of text summaries –Scatter Plots show docs as points; closeness indicates nearness in cluster space show main themes of docs as visual clumps or mountains –Kohonen Feature maps show main themes as adjacent polygons –BEAD show main themes as links within a force- directed placement network

Clustering for Collection Overviews Two main stepsTwo main steps –cluster the documents according to the words they have in common –map the cluster representation onto a (interactive) 2D or 3D representation

Text Clustering Finds overall similarities among groups of documentsFinds overall similarities among groups of documents Finds overall similarities among groups of tokensFinds overall similarities among groups of tokens Picks out some themes, ignores othersPicks out some themes, ignores others

Scatter/GatherScatter/Gather

S/G Example: query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 steller phenomena 10 flora/fauna 49 galaxies, stars 29 constellations 7 miscelleneous 7 miscelleneous Clustering and re-clustering is entirely automated

Scatter/Gather Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95 How it worksHow it works –Cluster sets of documents into general “themes”, like a table of contents –Display the contents of the clusters by showing topical terms and typical titles –User chooses subsets of the clusters and re-clusters the documents within –Resulting new groups have different “themes” Originally used to give collection overviewOriginally used to give collection overview Evidence suggests more appropriate for displaying retrieval results in contextEvidence suggests more appropriate for displaying retrieval results in context Appearing (sort-of) in commercial systemsAppearing (sort-of) in commercial systems

Northern Light Web Search: Started out with clustering. Then integrated with categories. Now does not do web search and uses only categories.

Teoma: appears to combine categories and clusters

Scatter Plot of Clusters (Chen et al. 97)Scatter Plot of Clusters (Chen et al. 97)

BEAD (Chalmers 97)

BEAD (Chalmers 96)BEAD (Chalmers 96) An example layout produced by Bead, seen in overview, of 831 bibliography entries. The dimensionality (the number of unique words in the set) is 6925. A search for ‘cscw or collaborative’ shows the pattern of occurrences coloured dark blue, mostly to the right. The central rectangle is the visualizer’s motion control.

Example: Themescapes (Wise et al. 95) Themescapes (Wise et al. 95)

Clustering for Collection Overviews Since text has tens of thousands of featuresSince text has tens of thousands of features –the mapping to 2D loses a tremendous amount of information –only very coarse themes are detected

Galaxy of News Rennison 95

Kohonen Feature Maps (Lin 92, Chen et al. 97)Kohonen Feature Maps (Lin 92, Chen et al. 97) (594 docs)

Study of Kohonen Feature Maps H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7)H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) Comparison: Kohonen Map and YahooComparison: Kohonen Map and Yahoo Task:Task: –“Window shop” for interesting home page –Repeat with other interface Results:Results: –Starting with map could repeat in Yahoo (8/11) –Starting with Yahoo unable to repeat in map (2/14)

How Useful is Collection Cluster Visualization for Search? Three studies find negative results

Study 1 Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive users. In Proc. of the 5th Annual Symposium on Document Analysis and Information Retrieval, 1996Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive users. In Proc. of the 5th Annual Symposium on Document Analysis and Information Retrieval, 1996 This study comparedThis study compared –a system with 2D graphical clusters –a system with 3D graphical clusters –a system that shows textual clusters Novice usersNovice users Only textual clusters were helpful (and they were difficult to use well)Only textual clusters were helpful (and they were difficult to use well)

Study 2: Kohonen Feature Maps H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7)H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) Comparison: Kohonen Map and YahooComparison: Kohonen Map and Yahoo Task:Task: –“Window shop” for interesting home page –Repeat with other interface Results:Results: –Starting with map could repeat in Yahoo (8/11) –Starting with Yahoo unable to repeat in map (2/14)

Study 2 (cont.) Participants liked:Participants liked: –Correspondence of region size to # documents –Overview (but also wanted zoom) –Ease of jumping from one topic to another –Multiple routes to topics –Use of category and subcategory labels

Study 2 (cont.) Participants wanted:Participants wanted: –hierarchical organization –other ordering of concepts (alphabetical) –integration of browsing and search –correspondence of color to meaning –more meaningful labels –labels at same level of abstraction –fit more labels in the given space –combined keyword and category search –multiple category assignment (sports+entertain)

Study 3: NIRVE NIRVE Interface by Cugini et al. 96. Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one another. Opening a cluster causes a projection that shows the titles.NIRVE Interface by Cugini et al. 96. Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one another. Opening a cluster causes a projection that shows the titles.

Study 3 Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, Proceedings of SIGIR 99, Berkeley, CA, 1999.Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces Sebrechts, Cugini, Laskowski, Vasilakis and Miller, Proceedings of SIGIR 99, Berkeley, CA, 1999. This study compared :This study compared : –3D graphical clusters –2D graphical clusters –textual clusters 15 participants, between-subject design15 participants, between-subject design TasksTasks –Locate a particular document –Locate and mark a particular document –Locate a previously marked document –Locate all clusters that discuss some topic –List more frequently represented topics

Study 3 Results (time to locate targets)Results (time to locate targets) –Text clusters fastest –2D next –3D last –With practice (6 sessions) 2D neared text results; 3D still slower –Computer experts were just as fast with 3D Certain tasks equally fast with 2D & textCertain tasks equally fast with 2D & text –Find particular cluster –Find an already-marked document But anything involving text (e.g., find title) much faster with text.But anything involving text (e.g., find title) much faster with text. –Spatial location rotated, so users lost context Helpful viz featuresHelpful viz features –Color coding (helped text too) –Relative vertical locations

Visualizing Clusters Huge 2D maps may be inappropriate focus for information retrievalHuge 2D maps may be inappropriate focus for information retrieval –cannot see what the documents are about –space is difficult to browse for IR purposes –(tough to visualize abstract concepts) Perhaps more suited for pattern discovery and gist-like overviewsPerhaps more suited for pattern discovery and gist-like overviews

Co-Citation Analysis Has been around since the 50’s. (Small, Garfield, White & McCain)Has been around since the 50’s. (Small, Garfield, White & McCain) Used to identify core sets ofUsed to identify core sets of –authors, journals, articles for particular fields –Not for general search Main Idea:Main Idea: –Find pairs of papers that cite third papers –Look for commonalitieis A nice demonstration by Eugene Garfield at: –http://165.123.33.33/eugene_garfield/papers/mapsciworld.html

Co-citation analysis (From Garfield 98)

Category Combinations Let’s show categories instead of clusters

DynaCat (Pratt, Hearst, & Fagan 99)

DynaCat (Pratt 97) Decide on important question types in an advanceDecide on important question types in an advance –What are the adverse effects of drug D? –What is the prognosis for treatment T? Make use of MeSH categoriesMake use of MeSH categories Retain only those types of categories known to be useful for this type of query.Retain only those types of categories known to be useful for this type of query.

DynaCat Study DesignDesign –Three queries –24 cancer patients –Compared three interfaces ranked list, clusters, categories ResultsResults –Participants strongly preferred categories –Participants found more answers using categories –Participants took same amount of time with all three interfaces

HiBrowse

Category Combinations HiBrowse Problem:HiBrowse Problem: –Search is not integrated with browsing of categories –Only see the subset of categories selected (and the corresponding number of documents)

MultiTrees (Furnas & Zacks ’94)

Cat-a-Cone: Multiple Simultaneous Categories Key Ideas:Key Ideas: –Separate documents from category labels –Show both simultaneously Link the two for iterative feedbackLink the two for iterative feedback Distinguish between:Distinguish between: –Searching for Documents vs. –Searching for Categories

Cat-a-Cone Interface Cat-a-Cone Interface

Cat-a-Cone Catacomb:Catacomb: (definition 2b, online Websters) “A complex set of interrelated things” Makes use of earlier PARC work on 3D+animation:Makes use of earlier PARC work on 3D+animation: Rooms Henderson and Card 86 IV: Cone Tree Robertson, Card, Mackinlay 93 Web Book Card, Robertson, York 96

Collection Retrieved Documents search Category Hierarchy browse query terms

ConeTree for Category Labels Browse/explore category hierarchyBrowse/explore category hierarchy –by search on label names –by growing/shrinking subtrees –by spinning subtrees AffordancesAffordances –learn meaning via ancestors, siblings –disambiguate meanings –all cats simultaneously viewable

Virtual Book for Result Sets –Categories on Page (Retrieved Document) linked to Categories in Tree –Flipping through Book Pages causes some Subtrees to Expand and Contract –Most Subtrees remain unchanged –Book can be Stored for later Re-Use

Improvements over Standard Category Interfaces Integrate category selection with viewing of categories Integrate category selection with viewing of categories Show all categories + context Show all categories + context Show relationship of retrieved documents to the category structure Show relationship of retrieved documents to the category structure But … do users understand and like the 3D? But … do users understand and like the 3D?

The FLAMENCO Project Basic idea similar to Cat-a-ConeBasic idea similar to Cat-a-Cone But use familiar HTML interaction to achieve similar goalsBut use familiar HTML interaction to achieve similar goals Usability results are very strong for users who care about the collection.Usability results are very strong for users who care about the collection.

Query Specification

Command-Based Query Specification command attribute value connector …command attribute value connector … –find pa shneiderman and tw user# What are the attribute names?What are the attribute names? What are the command names?What are the command names? What are allowable values?What are allowable values?

Form-Based Query Specification (Altavista)

Form-Based Query Specification (Melvyl)

Form-based Query Specification (Infoseek)

Direct Manipulation Spec. VQUERY (Jones 98)Direct Manipulation Spec. VQUERY (Jones 98)

Menu-based Query Specification (Young & Shneiderman 93)

Context

Putting Results in Context Visualizations of Query Term DistributionVisualizations of Query Term Distribution –KWIC, TileBars, SeeSoft Visualizing Shared Subsets of Query TermsVisualizing Shared Subsets of Query Terms –InfoCrystal, VIBE, Lattice Views Table of Contents as ContextTable of Contents as Context –Superbook, Cha-Cha, DynaCat Organizing Results with TablesOrganizing Results with Tables –Envision, SenseMaker Using HyperlinksUsing Hyperlinks –WebCutter

Putting Results in Context Interfaces shouldInterfaces should –give hints about the roles terms play in the collection –give hints about what will happen if various terms are combined –show explicitly why documents are retrieved in response to the query –summarize compactly the subset of interest

KWIC (Keyword in Context) An old standard, ignored until recently by internet search enginesAn old standard, ignored until recently by internet search engines –used in some intranet engines, e.g., Cha-Cha

Display of Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure

TileBars vGraphical Representation of Term Distribution and Overlap vSimultaneously Indicate: –relative document length –query term frequencies –query term distributions –query term overlap

Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs Example

Exploiting Visual Properties –Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83) –Varying shades of gray show varying quantities better than color (Tufte ‘83) –Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Key Aspect: Faceted Queries Conjunct of disjunctsConjunct of disjuncts Each disjunct is a conceptEach disjunct is a concept –osteoporosis, bone loss –prevention, cure –research, Mayo clinic, study User does not have to specify which are main topics, which are subtopicsUser does not have to specify which are main topics, which are subtopics Ranking algorithm gives higher weight to overlap of topicsRanking algorithm gives higher weight to overlap of topics –This kind of query works better at high- precision queries than similarity search (Hearst 95)

TileBars Summary vPreliminary User Studies vusers understand them vfind them helpful in some situations, but probably slower than just reading titles vsometimes terms need to be disambiguated

SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick & Wills 95)

Query Term Subsets Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents

Term Occurrences in Results Sets Show how often each query term occurs in retrieved documents Show how often each query term occurs in retrieved documents –VIBE (Korfhage ‘91) –InfoCrystal (Spoerri ‘94) –Problems: can’t see overlap of terms within docs quantities not represented graphically more than 4 terms hard to handle no help in selecting terms to begin with

InfoCrystal (Spoerri 94)

VIBE (Olson et al. 93, Korfhage 93)

Term Occurrences in Results Sets –Problems: can’t see overlap of terms within docs quantities not represented graphically more than 4 terms hard to handle no help in selecting terms to begin with

DLITE (Cousins 97) Supporting the Information Seeking ProcessSupporting the Information Seeking Process –UI to a digital library Direct manipulation interfaceDirect manipulation interface Workcenter approachWorkcenter approach –experts create workcenters –lots of tools for one task –contents persistent

Slide by Shankar Raman DLITE (Cousins 97) Drag and Drop interfaceDrag and Drop interface Reify queries, sources, retrieval resultsReify queries, sources, retrieval results Animation to keep track of activityAnimation to keep track of activity

IR Infovis Meta-Analysis (Chen & Yu ’00) GoalGoal –Find invariant underlying relations suggested collectively by empirical findings from many different studies ProcedureProcedure –Examine the literature of empirical infoviz studies 35 studies between 1991 and 2000 27 focused on information retrieval tasks But due to wide differences in the conduct of the studies and the reporting of statistics, could use only 6 studies

IR Infovis Meta-Analysis (Chen & Yu ’00) Conclusions:Conclusions: –IR Infoviz studies not reported in a standard format –Individual cognitive differences had the largest effect Especially on accuracy Somewhat on efficiency –Holding cognitive abilities constant, users did better with simpler visual-spatial interfaces –The combined effect of visualization is not statistically significant –Misc Tilebars and Scatter/Gather are well-known enough to not require citations!!

SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002.

Similar presentations

Presentation on theme: "SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002.

Similar presentations

Presentation on theme: "SIMS 247 Information Visualization and Presentation Marti Hearst March 15, 2002."— Presentation transcript:

Similar presentations

About project

Feedback