cs5984: Information Visualization Chris North Document Collections 2 cs5984: Information Visualization Chris North
Approaches Clustering (last time) Themescapes, … Network Keyword
Clustering With Full text Galaxy of News pg 452
Clustering Good: Bad: Map of collection Major themes and sizes Relationships between themes Scales up Bad: Where to locate documents with multiple themes? Both mountains, between mountains, …? Relationships between documents, within documents? Algorithm becomes (too) critical
Network Show inter-relationships Matrix or Complete Graph Similarity measure between all pairs of docs Threshold level Salton, pg 413
Variations Docs + Paragraphs Themes
Network Better for smaller, more detailed map Scale up: Network visualization Good: Can see more complex relationships between/within documents Can act like hyperlinks! Bad: Finding specific documents Scale up difficult
Combination: Thinkmap http://www.thinkmap.com/article.cfm?articleID=38
Keyword Search engine, keyword query “Information Retrieval” Rank ordered list “Information Retrieval”
Today Hearst, “Tilebars”, web umer, ashwini
VIBE Korfhage, http://www.pitt.edu/~korfhage/interfaces.html Documents located between query keywords using spring model
VR-VIBE
InfoCrystal Spoerri, pg 140 Venn Diagram, all possible combinations A&B&C&D A&C&D C&B C
Keyword Good: Bad: Reduces the browsing space Map according to user’s interests Bad: What keywords do I use? What about other related documents that don’t use these keywords? No initial overview Mega-hit, zero-hit problem
Assignment Mid-Project status report: due today Read for Thurs Fox, “Envision”, web, video aejaaz, ravi
Upcoming Weeks I’m at CHI all next week Tues: Go to VE, SciViz lab: Torg 3050 Bowman, Kriz, Kelso Thurs: McCrickard Read for Tues Apr 10 DeFanti, “Scientific Visualization”, pg 39 Sayle, “Rasmol”, web Yuying, ?