Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998.

Similar presentations


Presentation on theme: "Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998."— Presentation transcript:

1 Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998

2 Marti Hearst SIMS 247 Today Visualizing Collection Overviews (cont.)Visualizing Collection Overviews (cont.) Visualizing Query SpecificationsVisualizing Query Specifications –Selecting Term Subsets –Viewing Metadata Visualizing Retrieval ResultsVisualizing Retrieval Results –Show Hyperlink Structure (WebCutter) –Term Hit Distribution (TileBars, SeeSoft) –Group by Shared Metadata (Cat-a-Cone)

3 Marti Hearst SIMS 247 Showing Collection Overviews From Last time:From Last time: –Show documents as icons –Link together or place near one another according to: inter-document similarity hyperlink structure citation structure –Advantages can see large grouping patterns –Disadvantages what do the groups mean? documents usually belong in multiple groups groups are often somewhat arbitrary

4 Marti Hearst SIMS 247 Mapping Documents onto Landscapes (Chalmers 96)

5 Marti Hearst SIMS 247 Visualizing Query Term Specification Query term intersectionQuery term intersection –VIBE –Infocrystal Incremental Term SpecificationIncremental Term Specification –Lyberworld –WSJ online interface

6 Marti Hearst SIMS 247 Visualizing Query Term Intersection VIBEVIBE –establish points of interest (POI) on a 2D plane –these correspond to terms or concepts –position documents according to their intersections among POI

7 Marti Hearst SIMS 247 VIBE (Olsen et al. 93) A C B D 9 22 3 3

8 Marti Hearst SIMS 247 Visualizing Query Term Intersection InfoCrystalInfoCrystal –convert and extend Venn diagrams –show how many docs contain each subset of up to five terms

9 Marti Hearst SIMS 247 InfoCrystal (Spoerri 93) A C B D 19 201 # of docs containg A # of docs containg B and D # of docs containg A, C, and B 34

10 Marti Hearst SIMS 247 Hyperlink Relevant Documents WebCutter (Maarek & Shaul 97) Show documents as icons or text labelsShow documents as icons or text labels Choose a starting point for searchChoose a starting point for search Find documents that are linked to starting point and are most relevant to queryFind documents that are linked to starting point and are most relevant to query Continue searching most promising linksContinue searching most promising links Show link structure graphicallyShow link structure graphically

11 Marti Hearst SIMS 247 WebCutter (Maarek & Shaul 97)

12 Marti Hearst SIMS 247 WebCutter (Maarek & Shaul 97)

13 Marti Hearst SIMS 247 TileBars: Viewing Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure

14 Marti Hearst SIMS 247 TileBars vGraphical Representation of Term Distribution and Overlap vSimultaneously Indicate: –relative document length –query term frequencies –query term distributions –query term overlap

15 Marti Hearst SIMS 247 Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs TileBars Example

16 Marti Hearst SIMS 247

17 Marti Hearst SIMS 247

18 Marti Hearst SIMS 247 Exploiting Visual Properties –Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83) –Varying shades of gray show varying quantities better than color (Tufte ‘83) –Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

19 Marti Hearst SIMS 247 Represent Software as One-Dimensional Text SeeSoft (Eick 94) –Originally for software development –Show lines of code graphically how often modified written by whom highlight search terms –Extend to text show locations of search terms show recurring features –e.g., characters in a story

20 Marti Hearst SIMS 247 SeeSoft: Changes of Lines of Code over Time (Eick 94)

21 Marti Hearst SIMS 247 SeeSoft: Characters in Stories (Eick 94)

22 Marti Hearst SIMS 247 SeeDiff: Compare Differences between Two Files (Eick and Ball)

23 Marti Hearst SIMS 247 Alternative Way to Group Documents: Category MetaData Last time we saw ways to visualizedLast time we saw ways to visualized –clusters of documents –clusters of words taken from documents Clusters are data-drivenClusters are data-driven –depend on what documents were clustered –can find main themes –sometimes are hard to understand Alternative: human-generated categoriesAlternative: human-generated categories

24 Marti Hearst SIMS 247 What is Category Metadata for? “Normalizing” natural language“Normalizing” natural language –distinguish homonyms –group synonyms together Organizing informationOrganizing information –for search –for browsing/navigation Examples:Examples: –Yahoo directory –ACM keyword hierarchy

25 Marti Hearst SIMS 247 Example: MeSH and MedLine MeSH Medical Category HierarchyMeSH Medical Category Hierarchy –~18,000 labels –manually assigned –~8 labels/article on average –avg depth: 4.5, max depth 9 Top Level Categories:Top Level Categories: anatomydiagnosisrelated disc animalspsychtechnology diseasebiologyhumanities drugsphysics

26 Marti Hearst SIMS 247 What Categories Do Summarize a document according to pre-defined main topicsSummarize a document according to pre-defined main topics Compress the many ways of representing a concept into oneCompress the many ways of representing a concept into one Identify which subset of attributes are salient for a collectionIdentify which subset of attributes are salient for a collection

27 Clusters vs. Categories Clusters vs. Categories CLUSTERS Tailored to data Overall themes Require interpretation CATEGORIES Pre-assigned Particular attributes Familiar terminology

28 Marti Hearst SIMS 247 Large Category Sets Problems for User InterfacesProblems for User Interfaces Too many categories to browse Too many docs per category Docs belong to multiple categories Need to integrate search Need to show the documents

29 Marti Hearst SIMS 247 Multiple Categories per Document DrugSymptom Anatomy D1S1A1 D2S2A2 D3S3A3 Medical articles contain combinations of these concept types

30 Marti Hearst SIMS 247 [D1 S3 A1] [D3 S2 S3] [D1 D2 S2 A2] … Dx Sx Ax Dx Sx A1Dx S1 AxD1 Sx Ax Dx S1 A1D1 S1 AxD1 Sx A1 D1 S1 A1 How to Group the Category Types? A Lattice is Infeasible

31 Marti Hearst SIMS 247 Cat-a-Cone: Interactive Category Interface (Hearst & Karadi 97) Key: Separate representation of documents from categoriesKey: Separate representation of documents from categories –Place categories in 3D animated Tree –Collect retrieved documents into a re- usable “Book” –Link categories from Book to Tree –Innovative query specification

32 Marti Hearst SIMS 247 Cat-a-Cone: Integrate Navigation and Search (Hearst & Karadi 97) Interface that smoothly integratesInterface that smoothly integrates –search over multiple categories –search over document contents –browsing of multiple categories –browsing of retrieved documents Iterative, InteractiveIterative, Interactive

33 Marti Hearst SIMS 247 Collection Retrieved Documents search Category Hierarchy browse query terms

34 Marti Hearst SIMS 247 Collection Retrieved Documents search Category Hierarchy browse query terms

35 Marti Hearst SIMS 247 Cat-a-Cone (Hearst & Karadi 97)

36 Marti Hearst SIMS 247 ConeTree for Category Labels Browse/explore category hierarchyBrowse/explore category hierarchy –by search on label names –by growing/shrinking subtrees –by spinning subtrees AffordancesAffordances –learn meaning via ancestors, siblings –disambiguate meanings –all cats simultaneously viewable

37 Marti Hearst SIMS 247 Virtual Book for Result Sets –Categories on Page (Retrieved Document) linked to Categories in Tree –Flipping through Book Pages causes some Subtrees to Expand and Contract –Most Subtrees remain unchanged –Book can be Stored for later Re-Use

38 Marti Hearst SIMS 247 Cat-a-Cone (Hearst & Karadi 97) Catacomb:Catacomb: (definition 2b, online Websters) “A complex set of interrelated things” Makes use of earlier PARC work on 3D+animation:Makes use of earlier PARC work on 3D+animation: Rooms Henderson and Card 86 IV: Cone Tree Robertson, Card, Mackinlay 93 Web Book Card, Robertson, York 96

39 Marti Hearst SIMS 247 Summary: Cat-a-Cone Interface that smoothly integratesInterface that smoothly integrates –search over multiple categories –search over document contents –browsing of multiple categories –browsing of retrieved documents Iterative, InteractiveIterative, Interactive Retain partial results in a workspaceRetain partial results in a workspace

40 Marti Hearst SIMS 247 Summary: Visualizing Text Text is difficult to visualizeText is difficult to visualize –represents abstract concepts –many combinations of these abstract concepts Main visualization approaches:Main visualization approaches: –collection overviews based on 2D or 3D views of document clusters –graphical displays of relationships to query terms (for information access) –graphical displays of relationships to category subsets Open Questions:Open Questions: –How to walk the border between useful and gratuitous graphics? –Is anything better than showing titles?


Download ppt "Marti Hearst SIMS 247 SIMS 247 Lecture 20 Visualizing Text & Text Collections (cont.) April 2, 1998."

Similar presentations


Ads by Google