Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006.

Similar presentations


Presentation on theme: "Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006."— Presentation transcript:

1 Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006

2 Outline Some Visualization Design Principles  Illustrated with a new example Why Text is Tricky to Visualize How to do good visualization design with text while meeting analysts needs?  Focus on Flexibility with Reproducibility  Examples from 4 different domains

3 What Makes for a Good Visualization? Visually illuminates important aspects of the underlying data and domain. Supports the users’ tasks (better than without the visualization). Adheres to good design principles.

4 Example from Software Engineering Marat Boshernitsan, UC Berkeley PhD Dissertation 2006 Problem: need to make complex changes throughout code.  Example: convert from one API to another.

5 A Typical Solution Either requires programmers to understand and manipulate abstract syntax trees … Or requires learning another programming language (or both)!

6 First Attempt

7 Second Attempt

8 A Better Solution Build on how programmers think about programming.  Operate on the textual representation of code.

9 Users Operate on Familiar Visual Representation of Code

10 Context-and-Domain Sensitive Visual Cues

11 Lessons from this Example User-centered Design  This was the third attempt.  First 2 attempts did not accurately reflect how users think about the problem.  Careful design of labels and interaction cues  Very intelligent backend, but user-activated. Visually and interactively reflects how programmers think about programming.

12 What Makes for a Good Visualization for Analysts? Visually illuminates important aspects of the underlying data and domain. Supports the users’ tasks (better than without the visualization). Adheres to good design principles.

13 Goals vs. Tasks Analysts’ Goals:  Understand current and past situations  Predict and anticipate future situations Observations by Pirolli & Card ’05: Different analysts starting with people, organizations, tasks, and time:  predict coup likelihood  understand bio-warfare threats  understand relations within cartel

14 Goals vs. Tasks Analysts’ tasks:  Explore  Extract  Filter  Link  Arrange  Compare  Hypothesize (A combination of Foraging and Sensemaking) Should do the tasks only to support the goals.

15 Design Principles for Analysts Experienced analysts notice what is missing or unexpected (Wright et al. ’06) Thus consistency and reproducibility are important.

16 Design Principles for Analysts Analysts must guard against confirmation bias. (Pirolli & Card ’05) Thus it is important for analysts to  Be able to easily arrange and re-arrange,  View information flexibly from many angles, While at the same time retaining consistency and reproducibility. However … it’s hard to do this with text.

17 Working with Text Text is especially difficult to visualize Very high dimensionality  Tens to hundreds of thousands of features Compositional  Can be combined together in innumerable ways Abstract  And so difficult to visualize Not pre-attentive  Must foveate to read Subtle  Small differences matter Unordered

18 Text Meaning is NOT pre-attentive SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC

19 Why Text is Tough Abstract concepts are difficult to visualize Combinations of abstract concepts are even more difficult to visualize  time  shades of meaning  social and psychological concepts  causal relationships

20 Why Text is Tough The dog.. Why Text is Tough

21 The dog. The dog cavorts. The dog cavorted. Why Text is Tough

22 The man. The man walks. Why Text is Tough

23 The man walks the cavorting dog. So far, we can sort of show this in pictures. Why Text is Tough

24 As the man walks the cavorting dog, thoughts arrive unbidden of the previous spring, so unlike this one, in which walking was marching and dogs were baleful sentinels outside unjust halls. How do we visualize this? Why Text is Tough

25 Language only hints at meaning Most meaning of text lies within our minds and common understanding  “How much is that doggy in the window?” how much: social system of barter and trade (not the size of the dog) “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own “in the window” implies behind a store window, not really inside a window, requires notion of window shopping Why Text is Tough

26 General categories have no standard ordering (nominal data) Categorization of documents by single topics misses important distinctions Consider an article about  NAFTA  The effects of NAFTA on truck manufacture  The effects of NAFTA on productivity of truck manufacture in the neighboring cities of El Paso and Juarez Why Text is Tough

27 Other issues about language  Ambiguous (many different meanings for the same words and phrases)  Same meaning implied by different combinations  Different combinations imply different meanings

28 Why Text is (Deceptively) Easy Text is easier when you have a lot of it  Web search is now usually conjunction Text has a lot of redundancy  A very simple algorithm can: Pull out “important” phrases Find “meaningfully” related words Create a “summary” from document Group “related” documents

29 Why Text is Easy Pretty much any simple technique can pull out phrases that seem to characterize a document Most frequent words from an IR lecture : 109 slide 69 to 37 view 37 version 37 graphic 37 first 37 back 36 previous 36 next 32 of 31 the 30 recall 28 relevant 27 precision 25 retrieved 25 documents 21 and 18 evaluate 15 a 13 what 13 vs 13 how 12 trec 12 is 12 high 12 for 10 relevance 10 queries 10 on 9 information 8 x 8 why 8 as 8 answer 7 search 7 maron 7 document 7 blair 6 top 6 results 6 measure 6 length 6 in 6 evaluation 6 curves

30 Why Text is Easy Same text, removing most frequent words in language and most frequent in this text: 30 recall 28 relevant 27 precision 25 retrieved 25 documents 18 evaluate 13 vs 12 trec 12 high 10 relevance 10 queries 9 information 8 x 8 answer 7 search 7 maron 7 document 7 blair 6 top 6 results 6 measure 6 length 6 evaluation 6 curves These words can act as a simple summary of the document  People are good at inferring (sometimes inventing) the commonalities  People are bad at realizing what they are not seeing

31 Simple Text Analysis can Mislead Most frequent words  Biases towards concepts with unique identifiers. From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01

32 Major Trends vs. Minor Discoveries With text, it’s easy to extract and show the largest, main trends But often we want the rare but unexpected and important event:  Russian oil company example  Schwarzenegger and Enron  Cigarettes and kids  Person on the periphery who is working stealthily to influence things This is really difficult to solve!

33 Design Principles for Analysts Experienced analysts notice what is missing or unexpected. Analysts must guard against confirmation bias.  Need to be able to easily arrange and re-arrange,  View information flexibly from many angles, While at the same time retaining consistency and reproducibility. Interfaces should reflect the domain and data. How to achieve this with text collections?  Must transform text in understandable ways  Must provide multiple, consistent views that nevertheless allow for new discovery and insight

34 Why Emphasize Flexibility? Can’t view representations of all the text content at once. Instead, needs ways to flexibly navigate, group, organize, explore See important pieces over time.

35 The Importance of Flexibility Russell, Slaney, Qu, Houston ’05 The ease of viewing and manipulation in the system strongly influenced the kind of analysis operations done.

36 Examples of Flexibility on Text Data PaperLens (Conference proceedings) TAMKI (Customer service requests) Faceted Browsing (e-commerce)  Flamenco  Ebay Express  FaThumb TRIST and Sandbox (Analysts)

37 Flexible views Infoviz 2004 contest  Visualize 8 years of conference proceedings  Tasks: 1.Static Overview of 10 years of Infovis 2.Characterize the research areas and their evolution 3.The people in InfoVis 4.Which papers/authors are most often referenced? 5.How many papers conducted a user study? PaperLens integrated solution by Lee, Czerwinski, Robertson, Bederson Uses graphical elements and brushing and linking to flexibly elicudate a collection’s contents.  http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml

38

39

40 Flexibility in Foraging and Analysis TAKMI, by Nasukawa and Nagano, ‘01 The system integrates:  Analysis tasks (customer service help)  Content analysis  Information Visualization

41 Flexibility in Analysis TAKMI, by Nasukawa and Nagano, 2001 Documents containing “windows 98”

42 TAKMI, by Nasukawa and Nagano, 2001 Flexibility in Analysis TAKMI, by Nasukawa and Nagano, 2001 Patent documents containing “inkjet”, organized by entity and year

43 Flexibility in Category Navigation Browsing Information Collections using (Hierarchical) Faceted Metadata

44 What are facets? Sets of categories, each of which describe a different aspect of the objects in the collection. Each of these can be hierarchical. (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.) Time/DateTopicGeoRegion 

45 Facet example: Recipes Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Red Bell Pepper Curry Chicken

46 Nobel Prize Winners Collection

47

48

49

50

51

52

53

54 New Site: eBay Express

55

56

57

58

59

60

61

62 Is This Visualization? Prior experience and other people’s attempts seem to suggest that fewer graphics and more text is better. Details of layout, font and color contrast, label selection, and interaction make all the difference.

63 Earlier Variation on the Idea Cat-a-Cone, 1997

64 Mobile Variation FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06 Well-received, but visualization part not looked at.

65 Flexibility in SenseMaking DLITE by Cousins et al. ‘97 Sandbox by Wright et al. ‘06

66 Query History Entities Dimensions TRIST (The Rapid Information Scanning Tool) is the work space for Information Retrieval and Information Triage. Launch Queries Annotated Document Browser Comparative Analysis of Answers and Content User Defined and Automatic Categorization Rapid Scanning with Context Linked Multi-Dimensional Views Speed Scanning Flexibility in Sensemaking TRIST, Jonkers et al 05

67 Flexibility for Sensemaking Support Quick Emphasis of Items of Importance. Sandbox, Wright et al ‘06 Direct interaction with Gestures (no dialog, no controls). Dynamic Analytical Models. Assertions with Proving/Disproving Gates.

68 Communication-Centric Text Email, conversations, blogs  The first thought is usually nodes and links  Doesn’t have the desired flexibility Some alternatives:  The Network  Multivariate Networks

69 Re-envisioning Networks Viewing people’s shared workplaces, hometowns, schools over time.  www.theyrule.net:

70 Re-envisioning Networks First cut: Hastings, Snow, and King ’05

71 Re- envisioning Networks Better version: Hastings, Snow, and King ’05

72 Re-envisioning Networks Wattenberg ’06 OLAP on directed labeled graphs

73 Network Flexibility

74 Martin Wattenberg, “Visual Exploration of Multivariate Graphs” MF Location A Location B Location C Location D Location E

75 Re-envisioning Networks Idea: vary these ideas to apply to email and other communication text.

76 Summary: Text Viz Design Guidelines An emphasis on flexible views on text data  Emphasize brushing and linking using appropriate visual cues.  Interaction flow should guide the user but also be flexible.  Information structure should be consistent and reproducible. Other guidelines:  Make text visible.  Visual components should reflect the data and tasks.

77 Thank you! www.sims.berkeley.edu/~hearst


Download ppt "Marti Hearst School of Information, UC Berkeley Visualization in Text Analysis Problems VAC Consortium Meeting Stanford, May 24, 2006."

Similar presentations


Ads by Google