Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visualizing Document Collections

Similar presentations


Presentation on theme: "Visualizing Document Collections"— Presentation transcript:

1 Visualizing Document Collections
cs5764: Information Visualization Chris North

2 Where are we? Multi-D 1D 2D 3D Trees Graphs Document collections
Design Principles Empirical Evaluation Visual Overviews

3 Structured Document Collections
Multi-dimensional author, title, date, journal, … Trees Dewey decimal Graphs web, citations

4 Envision Ed Fox, et al. Multi-D similar to Spotfire

5 Citation Networks Butterfly Browser Mackinlay et al (PARC) Butterfly:
Left = refs Right = citers Yellow = #citers Blue = visited 3d plot: date, Name, # citers

6

7 Unstructured Document Collections
Focus on Full Text Examples: digital libraries, news archives, web pages archives, image galery Tasks: search Browse Classification, structurization Statistics, keyword usage, languages Subjects, themes, coverage

8 Visualization Strategies
Cluster Maps Keyword Query Relationships Reduced representation User controlled layout

9 Cluster Map Create a “map” of the document collection
Similar documents near each other Dissimilar document far apart “Grocery store” concept

10 Document Vectors Similarity between pair of docs =
Doc1 Doc2 Doc … “aardvark” “banana” “chris” Similarity between pair of docs = dot product Layout documents in 2-D map by similarity similar to spring model for graph layout

11 Cluster Algorithms Partition clustering: Partition into k subsets
Pick k seeds Iteratively attract nearest neighbors Hierarchical clustering: Dendrogram Group nearest-neighbor pair Iterate

12 Landscapes Wise et al, “Visualizing the non-visual”
ThemeScapes, Cartia PNNL Mountain height = Cluster size

13 Kohonen Maps Xia Lin, “Document Space”

14

15 WebSOM

16 Map.net

17 Galaxy of News MIT Cluster map with full text zooming

18 Cluster Map Good: Bad: Map of collection Major themes and sizes
Relationships between themes Scales up Bad: Where to locate documents with multiple themes? Both mountains, between mountains, …? Relationships between documents, within documents? Algorithm becomes (too) critical

19 Keyword Query Keyword query, Search engine “Information Retrieval”
Rank ordered list “Information Retrieval” Visualization of results

20 Keyword Distributions
Hearst, “TileBars” Keyword distributions within documents

21 Document Distributions
Korfhage, “VIBE” Documents located between query keywords using spring model

22 VR-VIBE

23 Keyword Query Good: Bad: Reduces the browsing space
Map according to user’s interests Bad: What keywords do I use? What about other related documents that don’t use these keywords? No initial overview Mega-hit, zero-hit problem

24 Relationships Show inter-relationships Matrix or Complete Graph Salton
Similarity measure between all pairs of docs Threshold level Salton

25 Variations Docs + Paragraphs Themes

26 Relationships Better for smaller, more detailed map
Scale up: Network visualization Good: Can see more complex relationships between/within documents Can act like hyperlinks! Bad: Finding specific documents Scale up difficult

27 Reduced Visual Representation
Bederson, “Image browsing”

28 User Controlled Layout
Card, “WebBook and Web Forager”

29 Data Mountain Robertson, “Data Mountain” (Microsoft)


Download ppt "Visualizing Document Collections"

Similar presentations


Ads by Google