Concept Map: Clustering Visualizations of Categorical Domains David Rouff and Mark McLean CMSC 743 Information Visualization, Spring 07 Department of Computer Science University of Maryland
1.INTRODUCTION Categorical Data Sets are hard to visualize No inherrent order, variable size, each element and it’s associations is a dimension Popular Node-Link quickly becomes occluded Adjacency matrix hard to understand, elements disjoint Concept Map visualizes categorical data with Multiple, coordinated views Modified hierarchical clustering and SOM clustering Tunable clustering and display parameters
3. Design Approach Mockups and prototypes guided development. Dendrogram/Adjacency Matrix in Matlab SOM/Node Link in Flex
Implementation of Dendrogram and Clustered Adjacency Matrix
Adaptations to Hierarchical clustering Similarity distance calculation FOIL
Interesting Features in the Dendrogram and Adjacency Matrix
Dendrogram and Adjacency Matrix Demo Then switch to SOM and Node Link
Magic… Hierarchy from Flatness and Clarity of Thought - Overview
Behind the Scenes Convert articles into vectors Compress the vectors to 24-bit space Apply vector to random SOM Best matching unit (BMU) learns and so does its neighbors – Mexican hat Group Clusters – low dimensional grouping Create Node-Link Graph
This hierarchy can then be traversed to explore articles Results Proves a large corpus of articles can be reconstructed into a meaningful hierarchy This hierarchy can then be traversed to explore articles The node-link representation delivers a clear distance between nodes and relationship among nodes
Lessons learned… where to begin Don’t bite off more than you can chew, in other words if it sounds like you are going to save the world, it might take more than 6 weeks to do it. When up against a short deadline, don’t try to learn a new language even if they swear it decreases development time and a 6 year old can use it. (in other words Caveat Emptor) There is always time to design no matter how small you think your program will be. Conceptually problems are easy to solve…. Practically they aren’t
Analysis, user feedback, conclusion, questions…