Document Collections cs5984: Information Visualization Chris North
Where are we? Multi-D 1D 2D Hierarchies/Trees Networks/Graphs Document collections 3D Design Principles Empirical Evaluation Java Development Visual Overviews Multiple Views Peripheral Views
Structured Document Collections Multi-dimensional author, title, date, journal, … Trees dewey decimal Networks web, citations
Envision Ed Fox, et al. Multi-D similar to Spotfire
Unstructured Document Collections Focus on Full Text Examples: digital libraries, encyclopedia Web, homepages, photo collections Tasks: search, keyword Browse Themes, subjects, topics, library coverage Size, distributions
Visualization Strategies Cluster Maps Keyword Query Relationships Reduced representation User controlled layout today
Cluster Map Create a “map” of the document collection Similar documents near Dissimilar document far “Grocery store” concept
Document Vectors Doc1Doc2Doc3 … “aardvark”120 “banana”210 “chris”003 … Similarity between pair of docs = Layout documents in 2-D map by similarity similar to spring model for graph layout
Cluster Algorithms Partition clustering: Partition into k subsets Pick k seeds Iteratively attract nearest neighbors Hierarchical clustering: Dendrogram Group nearest-neighbor pair Iterate
Kohonen Maps Xia Lin, “Document Space” samal, ying
Themescapes, Cartia PNL Mountain height = Cluster size
WebSOM
Map.net
Cluster Map Good: Map of collection Major themes and sizes Relationships between themes Scales up Bad: Where to locate documents with multiple themes? »Both mountains, between mountains, …? Relationships between documents, within documents? Algorithm becomes (too) critical
Keyword Query Keyword query, Search engine Rank ordered list “Information Retrieval”
Tilebars Hearst, “Tilebars” reenal, xueqi
VIBE Korfhage, Documents located between query keywords using spring model
VR-VIBE
Keyword Query Good: Reduces the browsing space Map according to user’s interests Bad: What keywords do I use? What about other related documents that don’t use these keywords? No initial overview Mega-hit, zero-hit problem
Assignment Thurs: Document Collections Bederson, “Image Browsing” » Rui, anusha Card, “Web Book and Web Forager” » mrinmayee, ming Demo your hw3: tues or thurs
Next Week Tues: 3-D data Kniss, “Interactive Volume Rendering with Direct Manip” » xueqi, mahesh Thurs: Workspaces Robertson, “Task Gallery” » supriya, varun Upson, “AVS” » christa, jun Thanksgiving break Tues 27: Debates Kobsa, “Empirical comparison of comm infovis systems” » kunal, zhiping
Upcoming Sched Tues: 3-D data Thurs: Workspaces Thanksgiving break Tues 27: Debates Thurs 29: How (not) to lie with visualization Dec: project presentations Dec 7: CHI 2-pagers due, student posters due