Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto.

Similar presentations


Presentation on theme: "Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto."— Presentation transcript:

1 Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto

2 Motivation Data visualization Data visualization –relies primarily on human cognition for value discovery; –permits direct incorporation of human ingenuity and analytic capabilities into data mining; –can very effectively deal with very large quantities of data; –powerfully combines with machine-based discovery techniques.

3 Uses Explorative Analysis Explorative Analysis –Data cleaning –Provide hypotheses Confirmative Analysis Confirmative Analysis –Confirm or reject hypotheses Presentation Presentation –Communicate your work

4 http://www.alz.washington.edu/DATA2001/GERALD1/sld011.htm

5 Calculated Properties of the Anscombe Data Sets mean of the x values = 9.0 mean of the y values = 7.5 equation of the least- squared regression line is: y = 3 + 0.5x sums of squared errors (about the mean) = 110.0

6 Calculated Properties of the Anscombe Data Sets regression sums of squared errors (variance accounted for by x) = 27.5 residual sums of squared errors (about the regression line) = 13.75 correlation coefficient = 0.82 coefficient of determination = 0.67

7 The Anscombe Data

8 Marley, 1885

9 Snow’s Cholera Map, 1855

10 http://pupgg.princeton.edu/disk20/anonymous/groth/lick/licknorth.gif

11 Graphical Excellence Graphical displays should: show the data show the data induce the viewer to think about the substance, not the methodology induce the viewer to think about the substance, not the methodology avoid distorting what the data says avoid distorting what the data says present many numbers in a small space present many numbers in a small space make large data sets coherent make large data sets coherent encourage the eye to compare different pieces of data encourage the eye to compare different pieces of data reveal the data at several levels of detail (broad overview to fine structure) reveal the data at several levels of detail (broad overview to fine structure) serve a reasonably clear purpose: description, exploration, tabulation, or decoration serve a reasonably clear purpose: description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of the data set. be closely integrated with the statistical and verbal descriptions of the data set. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

12 Graphical Excellence Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Nearly always multivariate. Nearly always multivariate. Requires telling the truth about the data. Requires telling the truth about the data. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

13 Lie Factor=14.8 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

14 Lie Factor Require: 0.95<Lie Factor<1.05 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

15 Using Area for One Dimensional Data Lie Factor=2.8 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

16 More guidelines: The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. No legends: use labels on graph Graphics must not quote data out of context. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

17 Data Ink Ratio Data ink Ratio = proportion of a graphic’s ink devoted to the non-redundant display of data-information. Data ink Ratio=1.0-(proportion of a graphic that can be erased without loss of data-information) (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

18 Maximize Data Density (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

19 Beware Chartjunk NO “Isn’t it remarkable that the computer can be programmed to draw like that.” YES: “My, what interesting data!” (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

20 How to Say Nothing with Information Visualization http://www.crs4.it/~zip/13ways.html Never include a color legend. Never include a color legend. Avoid annotation. Avoid annotation. Never mention error characteristics of the visualization method. Never mention error characteristics of the visualization method. When in doubt, smooth. When in doubt, smooth. Don’t say how long it required to plot. Don’t say how long it required to plot. Never compare your results with other data visualization techniques. Never compare your results with other data visualization techniques. Never cite references for the data. Never cite references for the data. Claim generality but show results from a single data set. Claim generality but show results from a single data set. Use viewing angle to hide blemishes in 3D objects. Use viewing angle to hide blemishes in 3D objects.

21 An Overview of Information Visualization Methods http://www.informatik.uni- halle.de/~keim/tutorials.html

22 Methods of Interest Scatterplot Matrices Scatterplot Matrices Parallel Coordinates Parallel Coordinates Pixel Oriented Methods Pixel Oriented Methods Icon based Methods Icon based Methods Dimensional Stacking Dimensional Stacking Treemap Treemap

23 Assignment 1: see handout

24 Some websites of interest: http://dmoz.org/Computers/Software/Databases/Data_Mining/ Public_Domain_Software/ http://dmoz.org/Computers/Software/Databases/Data_Mining/ Public_Domain_Software/ http://dmoz.org/Computers/Software/Databases/Data_Mining/ Public_Domain_Software/ http://dmoz.org/Computers/Software/Databases/Data_Mining/ Public_Domain_Software/ http://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Products/ Visualization/ http://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Products/ Visualization/ http://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Products/ Visualization/ http://www.cs.man.ac.uk/~ngg/InfoViz/Projects_and_Products/ Visualization/ Try a search at google.com using the followng key words together: name_of_method download software


Download ppt "Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto."

Similar presentations


Ads by Google