cs5984: Information Visualization Chris North Document Collections cs5984: Information Visualization Chris North
Structured document collections Data Spaces Multi-dimensional 1d 2d 3d Trees Networks Structured document collections
Document Collections Unstructured document collections Examples: Focus on Full Text Examples: acm dig lib, ieee Encyclopedia on cdrom Web search engine Tasks: search by keywords Browse by topics Contents Size Related documents Document sections
Goal Create a “map” of the document collection Similar documents near Dissimilar document far “Grocery store” concept
Vectorization Aardvark 1 2 0 Banana 2 1 0 Chris 0 0 3 … Doc1 Doc2 Doc3 Aardvark 1 2 0 Banana 2 1 0 Chris 0 0 3 … Similarity of docs = Mathematical comparison of direction of doc vectors
Map Layout documents’ n-D vectors onto 2-D map Kohonen feature map (self-organizing map) Neural network iterates to layout the documents (similar to spring model for graph layout) Concept -> Color & keyword # docs in concept -> size Concept similarity -> x,y Documents -> dots
Xia Lin
Web http://websom.hut.fi/websom/ http://maps.map.net/start
Today Wise, “Themescapes”, book pg 442 maulik, chris r
Assignment Read for Tues Read for Thurs Homework #3: due today Hearst, “Tilebars”, web umer, ashwini Read for Thurs Fox, “Envision”, web aejaaz, ravi Homework #3: due today Mid-Project status report: due Tues