Jewels, Himalayas and Fireworks, Extending Methods for Visualizing N Dimensional Clustering W. Jockheck Dept. of Computer Science North Dakota State University Fargo, North Dakota 58105 William.Jockheck@ndsu.nodak.edu Dr. William Perrizo William.Perrizo@ndsu.nodak.edu CATA 2010 March 2010
Overview Visualization of high dimensional data is awkward and in some cases leads to incorrect conclusions. The human mind is the most sophisticated and effective pattern recognition device available for two-dimensional (or three-dimensional) patterns. This paper considers “pattern preserving” (cluster preserving) transformations of high dimensional data into two dimensional space. The techniques enable quick visual scans which can yield significant amounts of pattern related information. Then other methods can be used for further drill down. CATA 2010 March 2010
Visualizing N Dimensional Clustering Chernoff's faces Parallel Coordinates Jewel Diagram CATA 2010 March 2010
Table Columns as Dimensions A table of n numeric columns can be considered as a set of points in n dimensional space (e.g., the 1st 4 IRIS columns). CATA 2010 March 2010
Projecting Hypercube into 2D When projecting an arbitrary vector is used for each dimensional axis. Five dimensional hypercube using the axis indicated. CATA 2010 March 2010
Jewel Diagram and Projections The single combined points in the jewel diagram represent an n-dimensional projection using the sides of the polygon as the axis of the dimensions. CATA 2010 March 2010
Jewel Diagram Revisited Other methods of projecting points Variations on parallel coordinates Himalayan variation A1 A5 CATA 2010 March 2010
Jewel Diagram Revisited Himalayan and odd sided polygon versions CATA 2010 March 2010
Adjustments and Fixes Variations on the arrangement and directionality of the axis were explored. In some variations pairs of attributes off set each other. Generally, parallel or similar axis were found to minimize visibility of differences. This lead to placing all axis in a single quadrant (so that they don’t cancel out each other’s effects). CATA 2010 March 2010
Single Quadrant Variations Uniform Axis Spacing Himalayan Axis Spacing CATA 2010 March 2010
Fireworks CATA 2010 March 2010
Injection of Noise To test the sensitivity to noise, additional attributes were added to the iris data that were random. In addition various method of favoring and arranging attributes were tested. CATA 2010 March 2010
Variations with Weighting and Noise Iris data With Noise All Weighted CATA 2010 March 2010
X Why not 3D instead of 2D VRML version was a “3D” implementation with samples slightly offset in the Z axis. 2D rant Humans only see in 2D. The retina, while curved only captures a 2D projection. Computer displays are 2D. Printed outputs are 2D. VRML only allows the modification of the projection into 2D. CATA 2010 March 2010
Problems As n increases Sequence attributes (sides /axes) Jewel diagram polygon approaches a circle. Fireworks axes become indistinguishable. Projected points tend to overlap and obscure. Possibility of coincident points increases. Sequence attributes (sides /axes) Alter projected point distribution. Convey different information. CATA 2010 March 2010
Problem Mitigation The reality of points and lines. Point has no dimensions, only location. Lines have no width. The reality of displays. Points have to have dimension to be visible. Lines have to have width to be visible. Scaling as a solution. CATA 2010 March 2010
Contribution of the Visualization Single projected point represents sample showing its relationship in the data set. Provides display of each attribute distribution. Provides visual input, full picture for the user. CATA 2010 March 2010
Summary Jewel / Fireworks Visualization of Dimensions beyond 3 Provides a single point projection for each sample (tuple). Computationally simple Very modifiable and adaptable Colors, sequences, weighting, scaling CATA 2010 March 2010
Jewels, Himalayas and Fireworks, Extending Methods for Visualizing N Dimensional Clustering W. Jockheck Dept. of Computer Science North Dakota State University Fargo, North Dakota 58105 William.Jockheck@ndsu.nodak.edu Dr. William Perrizo William.Perrizo@ndsu.nodak.edu CATA 2010 March 2010