CSE5230 - Data Mining, 2001Lecture 8.1 Data Mining - CSE5230 Data Mining and Information Visualization CSE5230/DMS/2001/8.

Slides:



Advertisements
Similar presentations
Using Charts and Graphs in the Classroom
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Visual Analytics Research at WPI Dr. Matthew Ward and Dr. Elke Rundensteiner Computer Science Department.
DM and Visualization 8. 1 COT5230 Data Mining Week 8 Lecture Data Mining and Information Visualization M O N A S H A U S T R A L I A ’ S I N T E R N A.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
Multivariate Data Visualization Adapted from Slides by: Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
CSE Data Mining, 2003Lecture 8.1 Data Mining - CSE5230 Information Visualization CSE5230/DMS/2003/8.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Data Mining – Intro.
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
Data Mining: A Closer Look
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Understanding Data Analytics and Data Mining Introduction.
NSW Curriculum and Learning Innovation Centre Tinker with Tinker Plots Elaine Watkins, Senior Curriculum Officer, Numeracy.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Space & Order (1) Jing Li The Visual Design and Control of Trellis Display R. A. Becker, W. S. Cleveland, and M. J. Shyu (1996). Source:
Machine Learning CSE 681 CH2 - Supervised Learning.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
Taxonomies of Visualization Techniques CMPT 455/826 - Week 12, Day 2 w12d2 Sept-Dec
8 th Grade Math Common Core Standards. The Number System 8.NS Know that there are numbers that are not rational, and approximate them by rational numbers.
The Scientific Method Honors Biology Laboratory Skills.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Fall 2002CS/PSY Information Visualization Picture worth 1000 words... Agenda Information Visualization overview  Definition  Principles  Examples.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Visualization Blaz Zupan Faculty of Computer & Info Science University of Ljubljana, Slovenia.
Robert Kosara, Helwig Hauser 1InfoVis STAR The State of the Art in Information Visualization Robert Kosara, Helwig Hauser.
Visual Perspectives iPLANT Visual Analytics Workshop November 5-6, 2009 ;lk Visual Analytics Bernice Rogowitz Greg Abram.
Data Warehousing.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
V Material obtained from summer workshop in Guildford County, July-2014.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
VisDB: Database Exploration Using Multidimensional Visualization Maithili Narasimha 4/24/2001.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
Building Dashboards SharePoint and Business Intelligence.
CATA 2010 March 2010 Jewels, Himalayas and Fireworks, Extending Methods for Visualizing N Dimensional Clustering W. Jockheck Dept. of Computer Science.
Daniel A. Keim, Hans-Peter Kriegel Institute for Computer Science, University of Munich 3/23/ VisDB: Database exploration using Multidimensional.
Polaris: A System for Query, Analysis and Visualization of Multi- dimensional Relational Database by Chris Stolte & Pat Hanrahan presenter Andrew Trieu.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
CONFIDENTIAL Data Visualization Katelina Boykova 15 October 2015.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
Data Mining and Decision Support
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
Introduction to Business Analytics
DATA VISUALIZATION BOB MARSHALL, MD MPH MISM FAAFP FACULTY, DOD CLINICAL INFORMATICS FELLOWSHIP.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-5 Chapters 9 and 10.
Extension of Star Coordinates into Three Dimensions Nathan Cooprider University of Utah School of Computing Robert Burton Brigham Young.
Data Mining – Intro.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Exploring Data
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
Visualization of Content Information in Networks using GlyphNet
Multidimensional Space,
Chapter 12 Analyzing Semistructured Decision Support Systems
Data exploration and visualization
Presentation transcript:

CSE Data Mining, 2001Lecture 8.1 Data Mining - CSE5230 Data Mining and Information Visualization CSE5230/DMS/2001/8

CSE Data Mining, 2001Lecture 8.2 Lecture Outline u Overview of information visualization u The role of visualization in the process of data mining u The patterns being sought: clusters and outliers u Issues when visualizing higher dimensional relationships u Criteria for comparison u A range of visualization techniques for exploratory data analysis

CSE Data Mining, 2001Lecture 8.3 Information Visualization u A conjunction of a number of fields: vData Mining vCognitive Science vGraphic Design vInteractive Computer Graphics u Information Visualization attempts to use visual approaches and dynamic controls to provide understanding and analysis of multidimensional data u The data may have no inherent 2D or 3D semantics and may be abstract in nature. There is no underlying physical model. Much of the data in databases is of this type

CSE Data Mining, 2001Lecture 8.4 Role of Information Visualization u Acts as an exploratory tool u Useful for identifying subsets of the data u Structures, trends and outliers may be identified u Statistical tests tend incorporate isolated instances into a broader model as they attempt to formulate global features u There is no requirement for an hypothesis, but the techniques can also support the formulation of hypotheses if wanted

CSE Data Mining, 2001Lecture 8.5 Integrating Visualization with Data Mining u There are four possible approaches: vUse the visualization technique to present the results of the data mining process vUse visualization techniques as complements to the data mining process. They complement and increase understanding in a passive way. vUse visualization techniques to steer the data mining process. The visualization aids in deciding the appropriate data mining technique to use and appropriate subsets of the data to consider. vApply data mining techniques to the visualization rather than directly to the data. The idea is to capture the essential semantics visually then apply the data mining tools.

CSE Data Mining, 2001Lecture 8.6 The Process of Knowledge Discovery in Databases (a.k.a. Data Mining) DataCleaning & Enrichment CodingData mining Reporting selection -domain consistency - clustering - segmentation -de-duplication - prediction -disambiguation Requirement Action Feedback Operational dataExternal data The Knowledge Discovery in Databases (KDD) process (AdZ1996) Informati on

CSE Data Mining, 2001Lecture 8.7 Visualization in the Context of the Data Mining Process u Visualization tools can potentially be used at a number of steps in the DM process. But: vthe same tools may not be appropriate at each step vhow they will be used may be different u In general, it is not important whether data visualization is the first step in the process or not vthe feedback loop which moves the process forward may be commenced by either a visualization or a query u some visualizations, (e.g. see slide 25) require an initial query to generate a visualization vthis is an example of a complementary approach »questions generate visualizations, which may prompt further questions or generate hypotheses

CSE Data Mining, 2001Lecture 8.8 Motivations for Visualization u The human visual system is extremely good at recognizing patterns vit is quicker and easier to understand visual representations than to absorb information from language or formal notations. u Exploratory visualization assists in: videntifying areas of interest videntifying questions which might usefully be asked u i.e. a relevant or revealing visualization of either part or all of a data set, may suggest useful questions and/or hypotheses to the analyst. These can then be confirmed by more rigorous approaches ve.g. some clustering techniques require an initial estimate of the number of clusters present in the data »visualization techniques can assist in this estimation

CSE Data Mining, 2001Lecture 8.9 Criteria for Comparison of Visualization Tools u Number of dimensions that can be represented u Number of data items that can be handled u Ability to handle categorical and other non- numeric data types u Ability to reveal patterns u Ease of use u Learning Curve (to what degree is the technique intuitive)

CSE Data Mining, 2001Lecture 8.10 Examples - Scatterplot u Each pair of features (i.e. fields of records) in a multidimensional database is graphed as a point in two dimensions (2D) vThis straightforward graphing procedure produces a simple scatterplot - a projection of the multidimensional data into 2D u The scatterplots of all pair-wise combinations of features are arranged in a matrix vThe figure on the following slide illustrates a scatter plot matrix of 3D from a study of abrasion loss in tyres. The features are hardness, tensile-strength, abrasion-loss [Tie1989] u Each “sub-graph” gives insight into the relationship between a pair of features

CSE Data Mining, 2001Lecture 8.11 Scatterplot Matrix u Scatterplot matrix of abrasion loss data [Tie1989]

CSE Data Mining, 2001Lecture 8.12 Possible Problems with Scatterplots u Everitt [Eve78, p. 5] gives two reasons why scatter plots can prove unsatisfactory: vif number of features is greater than ~10, the number of plots to be examined is very large » this is just as likely to lead to confusion as to knowledge of the structures in the data. vstructures existing in multidimensional data set do not necessarily appear in the 2D projections of the features represented in scatterplots (see next slide) u Despite these potential problems, variations on the scatterplot approach are the most commonly used of all the visualization techniques

CSE Data Mining, 2001Lecture 8.13 Scatterplots: recognizing high- dimensional structures - 1 u A structure which appears as a cluster in a 2D projection may in fact be a “pipe” in 3D va pipe is a structure in 3D that looks like a rod or pipe when viewed in a 3D representation u While the pipe is easily identifiable in a 3D display only projections of it will appear in the 2D components of the scatterplot matrix vdepending of the orientation of the pipe in 3D, it may not appear as an obvious cluster, if at all u Equivalent structures can exist in higher dimensions, e.g. a cluster in 5D might be a “pipe” in 6D vthe appearance of high-D structures in lower-D projections depends on the luck and skill of the analyst in choosing the projections, and on the alignment of the structures to the axes

CSE Data Mining, 2001Lecture 8.14 Scatterplots: recognizing high- dimensional structures - 2 Random(Uniform)May be a plane in 3D A cluster in 2DMay be a pipe in 3D (or a cluster in 3D)

CSE Data Mining, 2001Lecture 8.15 Example Tool: Spotfire

CSE Data Mining, 2001Lecture 8.16 Example Tool: Spotfire u The user interacts with data by choosing which features will form the horizontal and vertical axes u Other features can represented by color vthis is an example of using the richness of visual representations to provide more information to the user. As well as 2D spatial position, other modes such as colour, size, shape and even sound can be used to convey information about high-dimensional data u On the previous slide, the data set contains a 3D cluster in a 4D space (i.e. there are four features) vThere are also some background “noise” instances u The cluster can seen, with its centre at around (20, 74) vall the points in the cluster are red, showing that it’s a 3D cluster

CSE Data Mining, 2001Lecture 8.17 Example Tool: DBMiner

CSE Data Mining, 2001Lecture 8.18 Example Tool: DBMiner u DBMiner is an integrated data mining tool u It employs a data visualization known as a “data cube” (see On-Line Analytic Processing - OLAP) u After creating a data cube, user can apply a variety of data mining techniques to analyze the data further, including: vassociation, classification, prediction and clustering, etc. u The figure on the preceding slide shows a data cube for a data set which has 3D cluster of data instances in a 3D space

CSE Data Mining, 2001Lecture 8.19 Examples: Parallel Coordinates - 1 u Uses the idea of mapping a point in a multidimensional feature space on to a number of parallel axes u Each feature is mapped one axis vas many axes as need can be lined up side to side vthere is no limit to the number of dimensions that can be represented u A single polygonal line connects the individual coordinate mappings for each point u The technique has been applied in air traffic control, robotics, computer vision and computational geometry

CSE Data Mining, 2001Lecture 8.20 Examples: Parallel Coordinates - 2 u Parallel axes for RN. The polygonal line shown represents the point C= (C 1,...., C i-1, C i, C i+1,..., C n ) C1C1 CnCn X 1 X 2 X 3 X i-1 X n C i-1 CiCi

CSE Data Mining, 2001Lecture 8.21 Examples: Parallel Coordinates - 3 u The Parallel Coordinates visualization technique is employed in the software WinViz u The main advantage of the technique is that it can represent unlimited numbers of dimensions u When many points are represented using the parallel coordinates, the overlap of the polygonal lines can make it difficult to identify structures in the data. u Certain structures, such as clusters, can often be identified but others are hidden due to the overlap.

CSE Data Mining, 2001Lecture 8.22 Two Clusters In WinViz

CSE Data Mining, 2001Lecture 8.23 Examples: Stick Figures u The stick figure technique is intended to make use of the user’s low-level perceptual processes [PGL1995], such as perception of: vtexture, color, motion, and depth u The hope is that the user will “automatically” try to make physical sense of the pictures of the data created u Visualizations which represent multidimensional feature spaces by using a number of subspaces of 3D or less (e.g. scatterplots) rely more on our cognitive abilities than our perceptual abilities u Stick figures avoid this, and present all variables and data points in a single representation.

CSE Data Mining, 2001Lecture 8.24 Iconographic display using stick figures - US Census Data

CSE Data Mining, 2001Lecture 8.25 Examples: Pixel-based techniques u Query-Dependent Pixel-based Techniques vbased on a query, a “semantic distance” is calculated between each of the query feature values and the features of each instance in the DB. voverall distance between the data values for a specific instance and the data attribute values used in the predicate of the query is also calculated v if an feature value for a specific instance matches the query it is assigned a colour indicating a match »e.g. a sequence of colours starting from yellow and ending in black could be used, where black is assigned if none of the instance features match vInstances are arranged on the screen, with the data items with highest relevance in the centre of the display, and then proceeding outwards in a spiral

CSE Data Mining, 2001Lecture 8.26 Examples: Worlds within Worlds u Employs virtual reality devices to represent an nD virtual world in 3D or 4D-Hyperworlds vbasic approach to reducing the complexity of a multidimensional function is to hold one or more of its independent variables constant »equivalent to taking an infinitely thin slice of the world perpendicular to the constant variable’s axis vcan be repeated until there are 3 dimensions and the resulting slice can be manipulated and displayed with conventional 3D graphics hardware u After reducing the higher-dimensional space to 3 dimensions the additional dimensions can be added back, by adding additional 3D worlds within the first 3D world

CSE Data Mining, 2001Lecture 8.27 Dynamic Techniques u Allow interaction with the visualization to explore the data more effectively. Can potentially be applied to all visualization techniques vDynamic linking of the data attributes to the parameters of the visualization. vFiltering vLinking and “brushing” between multiple visualizations vZooming vDetails on demand

CSE Data Mining, 2001Lecture 8.28 Other Techniques u Keim and Kriegel’s query independent approach u Chernoff faces u Cone trees u Perspective walls u Visualization Spreadsheet u A number of techniques especially developed for web pages and their links

CSE Data Mining, 2001Lecture 8.29 References u [AdZ1996] P. Adriaans and D. Zantinge. Data Mining. Addison-Wesley, u [BeS1997] A. Berson & S. J. Smith, Data Warehousing, Data Mining and OLAP, McGraw-Hill, 1997 u [Eve1978] B. S. Everitt, Graphical Techniques for Multivariate Data, Heinemann Educational Books Ltd., London, 1978 u [Thu1999] B. Thuraisingham, Data Mining: Technologies, Techniques, Tools, and Trends, CRC Press LLC, Boca Raton, Florida 1999 u [Tie1989] L. Tierney, XLISP-STAT: A Statistical Environment Based on the XLISP Language (Version 2.0), University of Minnesota School of Statistics, Technical Report Number 528, July 1989 u [PGL1995] R. M. Pickett, G. Grinstein, H. Levkowitz and S. Smith, Harnessing Preattentive Perceptual Processes in Visualization, pp in Perceptual Issues in Visualization (Eds. G. Grinstein & H. Levkowitz), Springer-Verlag, Berlin, 1995 u [WGL1996] Database issues for data visualization, Proceedings of the IEEE Visualization '95 Workshop, A. Wierse, G. G. Grinstein and U. Lang, (eds), Atlanta, Georgia, USA, October 28, 1995 u [LeG1993] Database issues for data visualization, Proceedings of the IEEE Visualization '93 Workshop, J. P. Lee and G. G. Grinstein, (eds), San Jose, California, USA, October 26, 1993