Download presentation
Presentation is loading. Please wait.
1
Visualizing Document Collections
cs5764: Information Visualization Chris North
2
Where are we? Multi-D 1D 2D 3D Trees Graphs Document collections
Design Principles Empirical Evaluation Visual Overviews
3
Structured Document Collections
Multi-dimensional author, title, date, journal, … Trees Dewey decimal Graphs web, citations
4
Envision Ed Fox, et al. Multi-D similar to Spotfire
5
Citation Networks Butterfly Browser Mackinlay et al (PARC) Butterfly:
Left = refs Right = citers Yellow = #citers Blue = visited 3d plot: date, Name, # citers
7
Unstructured Document Collections
Focus on Full Text Examples: digital libraries, news archives, web pages archives, image galery Tasks: search Browse Classification, structurization Statistics, keyword usage, languages Subjects, themes, coverage
8
Visualization Strategies
Cluster Maps Keyword Query Relationships Reduced representation User controlled layout
9
Cluster Map Create a “map” of the document collection
Similar documents near each other Dissimilar document far apart “Grocery store” concept
10
Document Vectors Similarity between pair of docs =
Doc1 Doc2 Doc … “aardvark” “banana” “chris” … Similarity between pair of docs = dot product Layout documents in 2-D map by similarity similar to spring model for graph layout
11
Cluster Algorithms Partition clustering: Partition into k subsets
Pick k seeds Iteratively attract nearest neighbors Hierarchical clustering: Dendrogram Group nearest-neighbor pair Iterate
12
Landscapes Wise et al, “Visualizing the non-visual”
ThemeScapes, Cartia PNNL Mountain height = Cluster size
13
Kohonen Maps Xia Lin, “Document Space”
15
WebSOM
16
Map.net
17
Galaxy of News MIT Cluster map with full text zooming
18
Cluster Map Good: Bad: Map of collection Major themes and sizes
Relationships between themes Scales up Bad: Where to locate documents with multiple themes? Both mountains, between mountains, …? Relationships between documents, within documents? Algorithm becomes (too) critical
19
Keyword Query Keyword query, Search engine “Information Retrieval”
Rank ordered list “Information Retrieval” Visualization of results
20
Keyword Distributions
Hearst, “TileBars” Keyword distributions within documents
21
Document Distributions
Korfhage, “VIBE” Documents located between query keywords using spring model
22
VR-VIBE
23
Keyword Query Good: Bad: Reduces the browsing space
Map according to user’s interests Bad: What keywords do I use? What about other related documents that don’t use these keywords? No initial overview Mega-hit, zero-hit problem
24
Relationships Show inter-relationships Matrix or Complete Graph Salton
Similarity measure between all pairs of docs Threshold level Salton
25
Variations Docs + Paragraphs Themes
26
Relationships Better for smaller, more detailed map
Scale up: Network visualization Good: Can see more complex relationships between/within documents Can act like hyperlinks! Bad: Finding specific documents Scale up difficult
27
Reduced Visual Representation
Bederson, “Image browsing”
28
User Controlled Layout
Card, “WebBook and Web Forager”
29
Data Mountain Robertson, “Data Mountain” (Microsoft)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.