Download presentation
Presentation is loading. Please wait.
1
Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006
2
Outline Some Visualization Design Principles Illustrated with a new example Why Text is Tricky to Visualize How to do good visualization design with text while meeting analysts needs? Focus on Flexibility with Reproducibility Examples from 4 different domains
3
What Makes for a Good Visualization? Visually illuminates important aspects of the underlying data and domain. Supports the users’ tasks (better than without the visualization). Adheres to good design principles.
4
Example from Software Engineering Marat Boshernitsan, UC Berkeley PhD Dissertation 2006 Problem: need to make complex changes throughout code. Example: convert from one API to another.
5
A Typical Solution Either requires programmers to understand and manipulate abstract syntax trees … Or requires learning another programming language (or both)!
6
First Attempt
7
Second Attempt
8
A Better Solution Build on how programmers think about programming. Operate on the textual representation of code.
9
Users Operate on Familiar Visual Representation of Code
10
Context-and-Domain Sensitive Visual Cues
11
Lessons from this Example User-centered Design This was the third attempt. First 2 attempts did not accurately reflect how users think about the problem. Careful design of labels and interaction cues Very intelligent backend, but user-activated. Visually and interactively reflects how programmers think about programming.
12
What Makes for a Good Visualization for Analysts? Visually illuminates important aspects of the underlying data and domain. Supports the users’ tasks (better than without the visualization). Adheres to good design principles.
13
Goals vs. Tasks Analysts’ Goals: Understand current and past situations Predict and anticipate future situations Observations by Pirolli & Card ’05: Different analysts starting with people, organizations, tasks, and time: predict coup likelihood understand bio-warfare threats understand relations within cartel
14
Goals vs. Tasks Analysts’ tasks: Explore Extract Filter Link Arrange Compare Hypothesize (A combination of Foraging and Sensemaking) Should do the tasks only to support the goals.
15
Design Principles for Analysts Experienced analysts notice what is missing or unexpected (Wright et al. ’06) Thus consistency and reproducibility are important.
16
Design Principles for Analysts Analysts must guard against confirmation bias. (Pirolli & Card ’05) Thus it is important for analysts to Be able to easily arrange and re-arrange, View information flexibly from many angles, While at the same time retaining consistency and reproducibility. However … it’s hard to do this with text.
17
Working with Text Text is especially difficult to visualize Very high dimensionality Tens to hundreds of thousands of features Compositional Can be combined together in innumerable ways Abstract And so difficult to visualize Not pre-attentive Must foveate to read Subtle Small differences matter Unordered
18
Text Meaning is NOT pre-attentive SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
19
Why Text is Tough Abstract concepts are difficult to visualize Combinations of abstract concepts are even more difficult to visualize time shades of meaning social and psychological concepts causal relationships
20
Why Text is Tough The dog.. Why Text is Tough
21
The dog. The dog cavorts. The dog cavorted. Why Text is Tough
22
The man. The man walks. Why Text is Tough
23
The man walks the cavorting dog. So far, we can sort of show this in pictures. Why Text is Tough
24
As the man walks the cavorting dog, thoughts arrive unbidden of the previous spring, so unlike this one, in which walking was marching and dogs were baleful sentinels outside unjust halls. How do we visualize this? Why Text is Tough
25
Language only hints at meaning Most meaning of text lies within our minds and common understanding “How much is that doggy in the window?” how much: social system of barter and trade (not the size of the dog) “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own “in the window” implies behind a store window, not really inside a window, requires notion of window shopping Why Text is Tough
26
General categories have no standard ordering (nominal data) Categorization of documents by single topics misses important distinctions Consider an article about NAFTA The effects of NAFTA on truck manufacture The effects of NAFTA on productivity of truck manufacture in the neighboring cities of El Paso and Juarez Why Text is Tough
27
Other issues about language Ambiguous (many different meanings for the same words and phrases) Same meaning implied by different combinations Different combinations imply different meanings
28
Why Text is (Deceptively) Easy Text is easier when you have a lot of it Web search is now usually conjunction Text has a lot of redundancy A very simple algorithm can: Pull out “important” phrases Find “meaningfully” related words Create a “summary” from document Group “related” documents
29
Why Text is Easy Pretty much any simple technique can pull out phrases that seem to characterize a document Most frequent words from an IR lecture : 109 slide 69 to 37 view 37 version 37 graphic 37 first 37 back 36 previous 36 next 32 of 31 the 30 recall 28 relevant 27 precision 25 retrieved 25 documents 21 and 18 evaluate 15 a 13 what 13 vs 13 how 12 trec 12 is 12 high 12 for 10 relevance 10 queries 10 on 9 information 8 x 8 why 8 as 8 answer 7 search 7 maron 7 document 7 blair 6 top 6 results 6 measure 6 length 6 in 6 evaluation 6 curves
30
Why Text is Easy Same text, removing most frequent words in language and most frequent in this text: 30 recall 28 relevant 27 precision 25 retrieved 25 documents 18 evaluate 13 vs 12 trec 12 high 10 relevance 10 queries 9 information 8 x 8 answer 7 search 7 maron 7 document 7 blair 6 top 6 results 6 measure 6 length 6 evaluation 6 curves These words can act as a simple summary of the document People are good at inferring (sometimes inventing) the commonalities People are bad at realizing what they are not seeing
31
Simple Text Analysis can Mislead Most frequent words Biases towards concepts with unique identifiers. From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01
32
Major Trends vs. Minor Discoveries With text, it’s easy to extract and show the largest, main trends But often we want the rare but unexpected and important event: Russian oil company example Schwarzenegger and Enron Cigarettes and kids Person on the periphery who is working stealthily to influence things This is really difficult to solve!
33
Design Principles for Analysts Experienced analysts notice what is missing or unexpected. Analysts must guard against confirmation bias. Need to be able to easily arrange and re-arrange, View information flexibly from many angles, While at the same time retaining consistency and reproducibility. Interfaces should reflect the domain and data. How to achieve this with text collections? Must transform text in understandable ways Must provide multiple, consistent views that nevertheless allow for new discovery and insight
34
Why Emphasize Flexibility? Can’t view representations of all the text content at once. Instead, needs ways to flexibly navigate, group, organize, explore See important pieces over time.
35
The Importance of Flexibility Russell, Slaney, Qu, Houston ’05 The ease of viewing and manipulation in the system strongly influenced the kind of analysis operations done.
36
Examples of Flexibility on Text Data PaperLens (Conference proceedings) TAMKI (Customer service requests) Faceted Browsing (e-commerce) Flamenco Ebay Express FaThumb TRIST and Sandbox (Analysts)
37
Flexible views Infoviz 2004 contest Visualize 8 years of conference proceedings Tasks: 1.Static Overview of 10 years of Infovis 2.Characterize the research areas and their evolution 3.The people in InfoVis 4.Which papers/authors are most often referenced? 5.How many papers conducted a user study? PaperLens integrated solution by Lee, Czerwinski, Robertson, Bederson Uses graphical elements and brushing and linking to flexibly elicudate a collection’s contents. http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml
40
Flexibility in Foraging and Analysis TAKMI, by Nasukawa and Nagano, ‘01 The system integrates: Analysis tasks (customer service help) Content analysis Information Visualization
41
Flexibility in Analysis TAKMI, by Nasukawa and Nagano, 2001 Documents containing “windows 98”
42
TAKMI, by Nasukawa and Nagano, 2001 Flexibility in Analysis TAKMI, by Nasukawa and Nagano, 2001 Patent documents containing “inkjet”, organized by entity and year
43
Flexibility in Category Navigation Browsing Information Collections using (Hierarchical) Faceted Metadata
44
What are facets? Sets of categories, each of which describe a different aspect of the objects in the collection. Each of these can be hierarchical. (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.) Time/DateTopicGeoRegion
45
Facet example: Recipes Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Red Bell Pepper Curry Chicken
46
Nobel Prize Winners Collection
54
New Site: eBay Express
62
Is This Visualization? Prior experience and other people’s attempts seem to suggest that fewer graphics and more text is better. Details of layout, font and color contrast, label selection, and interaction make all the difference.
63
Earlier Variation on the Idea Cat-a-Cone, 1997
64
Mobile Variation FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06 Well-received, but visualization part not looked at.
65
Flexibility in SenseMaking DLITE by Cousins et al. ‘97 Sandbox by Wright et al. ‘06
66
Query History Entities Dimensions TRIST (The Rapid Information Scanning Tool) is the work space for Information Retrieval and Information Triage. Launch Queries Annotated Document Browser Comparative Analysis of Answers and Content User Defined and Automatic Categorization Rapid Scanning with Context Linked Multi-Dimensional Views Speed Scanning Flexibility in Sensemaking TRIST, Jonkers et al 05
67
Flexibility for Sensemaking Support Quick Emphasis of Items of Importance. Sandbox, Wright et al ‘06 Direct interaction with Gestures (no dialog, no controls). Dynamic Analytical Models. Assertions with Proving/Disproving Gates.
68
Communication-Centric Text Email, conversations, blogs The first thought is usually nodes and links Doesn’t have the desired flexibility Some alternatives: The Network Multivariate Networks
69
Re-envisioning Networks Viewing people’s shared workplaces, hometowns, schools over time. www.theyrule.net:
70
Re-envisioning Networks First cut: Hastings, Snow, and King ’05
71
Re- envisioning Networks Better version: Hastings, Snow, and King ’05
72
Re-envisioning Networks Wattenberg ’06 OLAP on directed labeled graphs
73
Network Flexibility
74
Martin Wattenberg, “Visual Exploration of Multivariate Graphs” MF Location A Location B Location C Location D Location E
75
Re-envisioning Networks Idea: vary these ideas to apply to email and other communication text.
76
Summary: Text Viz Design Guidelines An emphasis on flexible views on text data Emphasize brushing and linking using appropriate visual cues. Interaction flow should guide the user but also be flexible. Information structure should be consistent and reproducible. Other guidelines: Make text visible. Visual components should reflect the data and tasks.
77
Thank you! www.sims.berkeley.edu/~hearst
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.