Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto.

Slides:



Advertisements
Similar presentations
ENV Envisioning Information Lecture 8 – Good Design – What we can learn from Tufte Ken Brodlie
Advertisements

Lecture 06: Design II February 5, 2013 COMP Visualization.
Lecture 1: Beautiful graphics in R
Plotting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Cartographic Principles: Map design
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
© Anselm Spoerri Lecture 4 Human Visual System –Recap –3D vs 2D Debate –Object Recognition Theories Tufte – Envisioning Information.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 3.1 Chapter Three Art and Science of Graphical Presentations.
Data Presentation A guide to good graphics Bureau of Justice Statistics Marianne W. Zawitz.
1 Information Design Scott Matthews Courses: /
Mathematics for all: sense and nonsense of statistical representations Heleen Verhage, Freudenthal Institute PME25 Summer Institute, July 2001.
Graphical Data Displays and Interpretation 2009 October 9.
Graphical Data Displays and Interpretation Wednesday, October 9.
Scientific Communication and Technological Failure presentation for ILTM, July 9, 1998 Dan Little.
Visualization and Data Mining. 2 Outline  Graphical excellence and lie factor  Representing data in 1,2, and 3-D  Representing data in 4+ dimensions.
Design World Graphical Integrity
1 Information Design Scott Matthews Courses: /
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 3.1 Chapter Three Art and Science of Graphical Presentations.
2007 會計資訊系統計學 ( 一 ) 上課投影片 3.1 Chapter Three Art and Science of Graphical Presentations.
James Tam Information Visualization Concepts covered What is Information Visualization? Tufte's Principles for Information Visualization. Visual Variables.
Data Visualization.
1 Visualization Solutions for Effective Communication Warren C. Weber California State Polytechnic University, Pomona.
1 i247: Information Visualization and Presentation Marti Hearst Data Types and Graph Types.
ID-2050 The “Design” Lecture. Today Document Design Information Design Tufte’s “Data Maps” BREAK Graphical Excellence in practice.
Coye Cheshire & Andrew Fiore July 14, 2015 // Computer-Mediated Communication Analytic Visualizations.
Tufte’s Design Principles
Infographics Visualizing Data. What are they? InfographicsInfographics can be used to visualize data in beautiful and interesting ways making it fun and.
Jeffrey Nichols Displaying Quantitative Information May 2, 2003 Slide 0 Displaying Quantitative Information An exploration of Edward R. Tufte’s The Visual.
Information Graphics Joyeeta Dutta-Moscato July 9, 2013.
Graphics and visual information English 314 Technical communication Note: To hide or reveal these lecture notes, go to VIEW and click COMMENTS. This lecture.
Visualizing quantitative information martin krzywinski.
Charts and Graphs V
Making Graphs. The Basics … Graphical Displays Should: induce the viewer to think about the substance rather than about the methodology, graphic design,
Principles of Good Presentation Slides & Graphics November 21, 2008 Adapted from slides used by Katie Kopren.
Graphical Display and Presentation of Quantitative Information 13 February 2006.
Graphical Excellence CMSC 120: Visualizing Information 2/7/08 Lecture Part II.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
1.6 Linear Regression & the Correlation Coefficient.
Graphics for Macroeconomics. Principles Graphing is done best when it clearly communicates ideas about data Focus on the main point while preventing distractions.
Department of Politics and Government Illinois State University
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Scientific Research Methods in Geography Chapter 10 Feb 9, 2010.
GNET INTRODUCTION TO CONTENT. GNET INTRODUCTION.
 What to “know”? ◦ Goals of information visualization. ◦ About human perceptual capabilities. ◦ About the issues involved in designing visualization for.
Four types Data maps (17-19, Tufte, also History of the World in 100 Seconds)History of the World in 100 Seconds Time series Narrative graphics of space.
11/26/2015 V. J. Motto 1 Chapter 1: Linear Models V. J. Motto M110 Modeling with Elementary Functions 1.5 Best-Fit Lines and Residuals.
1 CSE 2337 Chapter 3 Data Visualization With Excel.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
Worth 1,000 Words How to use information graphics to make data meaningful National Association for Career and Technical Education Information May 17, 2012.
CONFIDENTIAL Data Visualization Katelina Boykova 15 October 2015.
Recap Iterative and Combination of Data Visualization Unique Requirements of Project Avoid to take much Data Audience of Problem.
Visual Presentation of Quantitative Data Cliff Shaffer Virginia Tech Fall 2015.
 Emphasize Ideas .Display abstract ideas in concrete, geometric shapes .Condense .Dramatize .Compare large amount of data .Indicate trend .Convey.
Data Visualization.
Infographic (informational graphic) Edward TufteEdward Tufte in The Visual Display of Quantitative Information defines 'graphical displays' in the following.
Multivariate Visualization. Projection Distortion.
Assignment 7: Thinking about graphical excellence By: Sarah K. Brooks.
Books Visualizing Data by Ben Fry Data Structures and Problem Solving Using C++, 2 nd edition by Mark Allen Weiss MATLAB for Engineers, 3 rd edition by.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Visual Presentation of Quantitative Data
Make Your Data Tell a Story
Visual Presentation of Quantitative Data
Graphical Data Displays and Interpretation
Keller: Stats for Mgmt & Econ, 7th Ed
Statistical power is….
Keller: Stats for Mgmt & Econ, 7th Ed
Experimental Methodology
Presentation transcript:

Information Visualization in Data Mining S.T. Balke Department of Chemical Engineering and Applied Chemistry University of Toronto

Motivation Data visualization Data visualization –relies primarily on human cognition for value discovery; –permits direct incorporation of human ingenuity and analytic capabilities into data mining; –can very effectively deal with very large quantities of data; –powerfully combines with machine-based discovery techniques.

Uses Explorative Analysis Explorative Analysis –Data cleaning –Provide hypotheses Confirmative Analysis Confirmative Analysis –Confirm or reject hypotheses Presentation Presentation –Communicate your work

Calculated Properties of the Anscombe Data Sets mean of the x values = 9.0 mean of the y values = 7.5 equation of the least- squared regression line is: y = x sums of squared errors (about the mean) = 110.0

Calculated Properties of the Anscombe Data Sets regression sums of squared errors (variance accounted for by x) = 27.5 residual sums of squared errors (about the regression line) = correlation coefficient = 0.82 coefficient of determination = 0.67

The Anscombe Data

Marley, 1885

Snow’s Cholera Map, 1855

Graphical Excellence Graphical displays should: show the data show the data induce the viewer to think about the substance, not the methodology induce the viewer to think about the substance, not the methodology avoid distorting what the data says avoid distorting what the data says present many numbers in a small space present many numbers in a small space make large data sets coherent make large data sets coherent encourage the eye to compare different pieces of data encourage the eye to compare different pieces of data reveal the data at several levels of detail (broad overview to fine structure) reveal the data at several levels of detail (broad overview to fine structure) serve a reasonably clear purpose: description, exploration, tabulation, or decoration serve a reasonably clear purpose: description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of the data set. be closely integrated with the statistical and verbal descriptions of the data set. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Graphical Excellence Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Gives the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Nearly always multivariate. Nearly always multivariate. Requires telling the truth about the data. Requires telling the truth about the data. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Lie Factor=14.8 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Lie Factor Require: 0.95<Lie Factor<1.05 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Using Area for One Dimensional Data Lie Factor=2.8 (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

More guidelines: The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. No legends: use labels on graph Graphics must not quote data out of context. (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Data Ink Ratio Data ink Ratio = proportion of a graphic’s ink devoted to the non-redundant display of data-information. Data ink Ratio=1.0-(proportion of a graphic that can be erased without loss of data-information) (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Maximize Data Density (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

Beware Chartjunk NO “Isn’t it remarkable that the computer can be programmed to draw like that.” YES: “My, what interesting data!” (E.R. Tufte, “The Visual Display of Quantitative Information”, 2nd edition)

How to Say Nothing with Information Visualization Never include a color legend. Never include a color legend. Avoid annotation. Avoid annotation. Never mention error characteristics of the visualization method. Never mention error characteristics of the visualization method. When in doubt, smooth. When in doubt, smooth. Don’t say how long it required to plot. Don’t say how long it required to plot. Never compare your results with other data visualization techniques. Never compare your results with other data visualization techniques. Never cite references for the data. Never cite references for the data. Claim generality but show results from a single data set. Claim generality but show results from a single data set. Use viewing angle to hide blemishes in 3D objects. Use viewing angle to hide blemishes in 3D objects.

An Overview of Information Visualization Methods halle.de/~keim/tutorials.html

Methods of Interest Scatterplot Matrices Scatterplot Matrices Parallel Coordinates Parallel Coordinates Pixel Oriented Methods Pixel Oriented Methods Icon based Methods Icon based Methods Dimensional Stacking Dimensional Stacking Treemap Treemap

Assignment 1: see handout

Some websites of interest: Public_Domain_Software/ Public_Domain_Software/ Public_Domain_Software/ Public_Domain_Software/ Visualization/ Visualization/ Visualization/ Visualization/ Try a search at google.com using the followng key words together: name_of_method download software