16.1 Vis_2002 Data Visualization Lecture 14 Information Visualization Part 1
16.2 Vis_2002 What is Visualization? n Generally: – The use of computer-supported, interactive, visual representations of data to amplify cognition Card, McKinlay and Schneiderman n Two ‘branches’: – Scientific Visualization – Information Visualization.. But first… an experiment
16.3 Vis_2002 The Experiment n You need a watch with a second-hand n Without using pencil and paper (or a calculator!!), multiply 72 by 34 n How long did it take? n Now you need pencil and paper as well as watch n Multiply 47 by 54 n How long did it take? n Conclusion?
16.4 Vis_2002 Visualization – Twin Subjects n Scientific Visualization – Visualization of physical data n Information Visualization – Visualization of abstract data Ozone layer around earth Automobile web site - visualizing links … but this is only one characterisation
16.5 Vis_2002 Scientific Visualization – Another Characterisation n Focus is on visualizing an entity measured in a multi-dimensional space – 1D – 2D – 3D – Occasionally nD n Underlying field is recreated from the sampled data n Relationship between variables well understood – some independent, some dependent Image from D. Bartz and M. Meissner
16.6 Vis_2002 Scientific Visualization Model n Visualization represented as pipeline: – Read in data – Build model of underlying entity – Construct a visualization in terms of geometry – Render geometry as image n Realised as modular visualization environment – IRIS Explorer – IBM Open Visualization Data Explorer (DX) – AVS visualizemodeldatarender
16.7 Vis_2002 Information Visualization n Focus is on visualizing set of observations that are multi-variate n Example of iris data set – 150 observations of 4 variables (length, width of petal and sepal) – Techniques aim to display relationships between variables
16.8 Vis_2002 Dataflow for Information Visualization n Again we can express as a dataflow – but emphasis now is on data itself rather than underlying entity n First step is to form the data into a table of observations, each observation being a set of values of the variables n Then we apply a visualization technique as before visualize data table datarender ABC variables observations
16.9 Vis_2002 Applications of Information Visualization n Data Collections – Census data – Astronomical Data – Bioinformatics Data – Supermarket checkout data – and so on – Can relationships be discovered amongst the variables? n Networks of Information – traffic - Web documents – Hierarchies of information (eg filestores) We shall see that all can be described as data tables
16.10 Vis_2002 Multivariate Visualization n Software: – Xmdvtool Matthew Ward n Techniques designed for any number of variables – Scatter plot matrices – Parallel co-ordinates – Glyph techniques Acknowledgement: Many of images in following slides taken from Ward’s work..and also IRIS Explorer!
16.11 Vis_2002 Scatter Plot n Simple technique for 2 variables is the scatter plot This example from NIST shows linear correlation between the variables handbook/eda/section3/ scatterp.htm
16.12 Vis_2002 3D Scatter Plots n There has been some success at extending concept to 3D for visualizing 3 variables XRT/3d
16.13 Vis_2002 Extending to Higher Numbers of Variables n Additional variables can be visualized by colour and shape coding n IRIS Explorer used to visualize data from BMW – Five variables displayed using spatial arrangement for three, colour and object type for others – Notice the clusters… Kraus & Ertl
16.14 Vis_2002 IRIS Explorer 3D Scatter Plots n Try this…. Thanks to:
16.15 Vis_2002 Scatter Plots for M variables n For table data of M variables, we can look at pairs in 2D scatter plots n The pairs can be juxtaposed: A B C C B A With luck, you may spot correlations between pairs as linear structures
16.16 Vis_2002 Scatter Plot Data represents 7 aspects of cars: what relationships can we notice? For example, what correlates with high MPG? Pictures from Xmdv tool developed by Matthew Ward: davis.wpi.edu/~xmdv
16.17 Vis_2002 Parallel Coordinates: Visualizing M variables on one chart ABCDEF - create M equidistant vertical axes, each corresponding to a variable - each axis scaled to [min, max] range of the variable - each observation corresponds to a line drawn through point on each axis corresponding to value of the variable
16.18 Vis_2002 Parallel Coordinates ABCDEF - correlations may start to appear as the observations are plotted on the chart - here there appears to be negative correlation between values of A and B for example - this has been used for applications with thousands of data items
16.19 Vis_2002 Parallel Coordinates Example Detroit homicide data 7 variables 13 observations
16.20 Vis_2002 The Screen Space Problem n All techniques, sooner or later, run out of screen space n Parallel co- ordinates – Usable for up to 150 variates – Unworkable greater than 250 variates Remote sensing: 5 variates, 16,384 observations)
16.21 Vis_2002 Brushing as a Solution n Brushing selects a restricted range of one or more variables n Selection then highlighted
16.22 Vis_2002 Scatter Plot Use of a ‘brushing’ tool can highlight subsets of data..now we can see what correlates with high MPG
16.23 Vis_2002 Parallel Coordinates Brushing picks out the high MPG data Can you observe the same relations as with scatter plots? More or less easy?
16.24 Vis_2002 Parallel Coordinates Here we highlight high MPG and not 4 cylinders
16.25 Vis_2002 Clustering as a Solution n Success has been achieved through clustering of observations n Hierarchical parallel co-ordinates – Cluster by similarity – Display using translucency and proximity-based colour
16.26 Vis_2002 Hierarchical Parallel Co- ordinates
16.27 Vis_2002 Reduction of Dimensionality of Variable Space n Reduce number of variables, preserve information n Principal Component Analysis – Transform to new co- ordinate system – Hard to interpret n Hierarchical reduction of variable space – Cluster variables where distance between observations is typically small – Choose representative for each cluster
16.28 Vis_2002 Glyph Techniques – Star Plots n Star plots – Each observation represented as a ‘star’ – Each spike represents a variable – Length of spike indicates the value Crime in Detroit
16.29 Vis_2002 Chernoff Faces n Chernoff suggested use of faces to encode a variety of variables - can map to size, shape, colour of facial features - human brain rapidly recognises faces
16.30 Vis_2002 Chernoff Faces n Here are some of the facial features you can use
16.31 Vis_2002 Chernoff Faces n Demonstration applet at: – ects/Faces/
16.32 Vis_2002 Chernoff’s Face n.. And here is Chernoff’s face Chernoff/Hcindex.htm
16.33 Vis_2002 Daisy Charts Dry Wet Showery Saturday Sunday Leeds Sahara Amazon variables and their values placed around circle lines connect the values for one observation This item is { wet, Saturday, Amazon }
16.34 Vis_2002 Daisy Charts - Underground Problems
16.35 Vis_2002 Scientific Visualization – Information Visualization n Focus is on visualizing set of observations that are multi-variate n There is no underlying field – it is the data itself we want to visualize n The relationship between variables is not well understood n Focus is on visualizing an entity measured in a multi-dimensional space n Underlying field is recreated from the sampled data n Relationship between variables well understood Scientific VisualizationInformation Visualization
16.36 Vis_2002 Further Reading n Information Visualization – Robert Spence – published 2000 by Addison Wesley n See also resources section of the module web site