Summer Student Program 15 August 2007 Cluster visualization using parallel coordinates representation Bastien Dalla Piazza Supervisor: Olivier Couet.

Slides:



Advertisements
Similar presentations
Chapter 8: Functions of Several Variables
Advertisements

Why ROOT?. ROOT ROOT: is an object_oriented frame work aimed at solving the data analysis challenges of high energy physics Object _oriented: by encapsulation,
Chapter 3 Graphic Methods for Describing Data. 2 Basic Terms  A frequency distribution for categorical data is a table that displays the possible categories.
Shared Graphics Skills Cameras and Clipping Planes
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Multidimensional data processing. Multivariate data consist of several variables for each observation. Actually, serious data is always multivariate.
Rolling the Dice: Multidimensional Visual Exploration using Scatterplot Matrix Navigation 1 Niklas Elmqvist | Purdue University Pierre Dragicevic | INRIA.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Shapes and the Coordinate System TEKS 8.6 (A,B) & 8.7 (A,D)
Copyright © 2005 Department of Computer Science CPSC 641 Winter Data Analysis and Presentation There are many “tricks of the trade” used in data.
1 Data Analysis H There are many “tricks of the trade” used in data analysis and results presentation H A few will be mentioned here: –statistical analysis.
TEKS 8.6 (A,B) & 8.7 (A,D) This slide is meant to be a title page for the whole presentation and not an actual slide. 8.6 (A) Generate similar shapes using.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Excel Lesson 6 Enhancing a Worksheet
Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
Pre-Calculus Lesson 7-3 Solving Systems of Equations Using Gaussian Elimination.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Displaying and Exploring Data Unit 1: One Variable Statistics CCSS: N-Q (1-3);
Computational Intelligence: Methods and Applications Lecture 4 CI: simple visualization. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.
CARTESIAN COORDINATE SYSTEMS
CS 325 Introduction to Computer Graphics 03 / 08 / 2010 Instructor: Michael Eckmann.
Lattice Technology New Product Feature Highlights July 2010 Product Release.
Graphing in Science Class
The Scientific Method Honors Biology Laboratory Skills.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Are You Smarter Than a 5 th Grader?. 1,000,000 5th Grade Topic 15th Grade Topic 24th Grade Topic 34th Grade Topic 43rd Grade Topic 53rd Grade Topic 62nd.
Advanced Computer Graphics Depth & Stencil Buffers / Rendering to Textures CO2409 Computer Graphics Week 19.
Graphs An Introduction. What is a graph?  A graph is a visual representation of a relationship between, but not restricted to, two variables.  A graph.
A Picture Is Worth A Thousand Words. DAY 7: EXCEL CHAPTER 4 Tazin Afrin September 10,
1 Multidimensional Detective Alfred Inselberg, Multidimensional Graphs Ltd Tel Aviv University, Israel Presented by Yimeng Dou
1 Introduction Line attribute Color and gray scale Area filled attribute Anti-aliasing.
Dot Plots and Histograms Lesson After completing this lesson, you will be able to say: I can create a dot plot and histogram to display a set of.
1. Chapter 15 Creating Charts 3 Charting Data in Word A chart or graph presents data visually. A chart depicts numeric data in a graphical format. If.
Copyright © 2010 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
VisDB: Database Exploration Using Multidimensional Visualization Maithili Narasimha 4/24/2001.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
Graphs and How to Use Them. Graphs Visually display your results and data Allow you (and your peers) to see trends Help to make conclusions easier Are.
CATA 2010 March 2010 Jewels, Himalayas and Fireworks, Extending Methods for Visualizing N Dimensional Clustering W. Jockheck Dept. of Computer Science.
Chapter 3 Response Charts.
CHEP September 2007 ROOT Graphics: Status and Future Plans Olivier Couet (CERN)
CERN Summer Students 2011 : Project presentation CERN – PH/SFT – 05/09/2011.
Uncovering Clusters in Crowded Parallel Coordinates Visualizations Alimir Olivettr Artero, Maria Cristina Ferreiara de Oliveira, Haim levkowitz Information.
Scatter Plots Scatter plots are a graphic representation of collated biviariate data via a mathematical diagram using Cartesian coordinates. The data.
Postgraduate Computing Lectures PAW 1 PAW: Physicist Analysis Workstation What is PAW? –A tool to display and manipulate data. Learning PAW –See ref. in.
Effective Visuals Tables Graphs Charts Illustrations.
What Affects Students’ Performance in School? A report by: Justin Caldwell.
CSCE 552 Fall 2012 Math By Jijun Tang. Applied Trigonometry Trigonometric functions  Defined using right triangle  x y h.
Graphing in Excel X-Y Scatter Plot SCI 110 CCC Skills Training.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
Chapter 3 Graphs and Functions. § 3.1 Graphing Equations.
Techniques for Decision-Making: Data Visualization Sam Affolter.
By Christy Quattrone Click to View Types of Graphs Data Analysis, Grade 5.
Fitting: Voting and the Hough Transform
Bayesian Generalized Product Partition Model
Xin Zhao and Arie Kaufman
Graphing Equations and Inequalities
Lindita Camaj Associate professor
Gimp Guide Mr Hall.
Mean Shift Segmentation
Line and Character Attributes 2-D Transformation
Jewels, Himalayas and Fireworks, Extending Methods for
Chapter 8: Functions of Several Variables
Graphs with SPSS.
Jewels, Himalayas and Fireworks, Extending Methods for
Pasi Fränti and Sami Sieranoja
Constructing and Interpreting Visual Displays of Data
Presentation transcript:

Summer Student Program 15 August 2007 Cluster visualization using parallel coordinates representation Bastien Dalla Piazza Supervisor: Olivier Couet

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (1/11) ‏ The multidimensional system of Parallel Coordinates Plots (||-Coord) is a common way of studying and visualizing multivariate data sets. They were proposed by A.Inselberg in 1981 as a new way to represent multi-dimensional information. In traditional Cartesian coordinates, axes are mutually perpendicular. In Parallel coordinates, all axes are parallel which allows to represent data in much more than 3 dimensions. To show a set of points in ||-Coord, a set of parallel lines is drawn, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes. The position of the vertex on the i-th axis corresponds to the i-th coordinate of the point.

Bastien Dalla Piazza Summer Student Program August The ||-Coord representation of the six dimensional point (-5,3,4,2,0,1) is:The line y = -3x+20 in Cartesian coordinates is: Parallel Coordinates Plots (2/11) ‏ It appears like this in ||-Coord: The same can be done for a circle. In Cartesian coordinates: in ||-Coord:

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (3/11) ‏ ||-Coord plots are a widely used technique to display and explore multi-dimensional data. It is good at: spotting irregular events, see the data trend, finding correlations and clusters. Its main weakness is the cluttering of the output. But there are techniques to bypass it. My project was to implement ||-Coord plots in ROOT as a new plotting option “ PARA ” in the TTree::Draw() method.

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (4/11) ‏ void parallel_example() { TNtuple *nt = new TNtuple("nt","Demo ntuple","x:y:z:u:v:w:a:b:c"); for (Int_t i=0; i<3000; i++) { nt->Fill( rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd ); nt->Fill( s1x, s1y, s1z, s2x, s2y, s2z, rnd, rnd, rnd ); nt->Fill( rnd, rnd, rnd, rnd, rnd, rnd, rnd, s3y, rnd ); nt->Fill( s2x-1, s2y-1, s2z, s1x+.5, s1y+.5, s1z+.5, rnd, rnd, rnd ); nt->Fill( rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd ); nt->Fill( s1x+1, s1y+1, s1z+1, s3x-2, s3y-2, s3z-2, rnd, rnd, rnd ); nt->Fill( rnd, rnd, rnd, rnd, rnd, rnd, s3x, rnd, s3z ); nt->Fill( rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd, rnd ); } 9 variables: x, y, z, u, v, w, a, b, c. 3000*8 = events. Three sets of random points distributed on spheres: s1, s2, s3 Random values (noise): rnd 6 “spheres” correlated 2 by 2 on the variables x,y,z,u,v,w The variables a,b,c are almost completely random. a and c are correlated via the 1 st and 3 rd coordinates of the 3 rd “sphere”. This “pseudo C++” code produces the data set we’ll use to show the ||-Coord usage.

Bastien Dalla Piazza Summer Student Program August To show better where clusters are, a 1D histogram is associated to each axis.We have implemented a very simple technique to reduce the cluttering. The used command is: nt->Draw("x:a:y:b:z:u:c:v:w"); It gives: Parallel Coordinates Plots (5/11) ‏ Not very useful …  The histograms can be represented with a color palette. The thickness can be changed. The histograms can be represented as bar chart But still the clusters are not visible… The image cluttering is very high ! Instead of painting solid lines we paint dotted lines. The space between the dots is a parameter which can be adjusted in order to get the best result. The clusters ( in this case the “spheres”) now appear clearly !

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (6/11) ‏ The order in which the axis are displayed is very important to show clusters: Swap a and y Move z after y Move u before v Swap b and c All the clusters we have introduced in the data set are now clearly visible. Moving u, v, w after x, y, z shows these 6 variables are correlated.

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (7/11) ‏ To pursue further the data set exploration one can use selections. A selection is a set of ranges combined together. Within a selection, ranges along the same axis are combined with OR, and ranges on different axis with AND. A selection is displayed on top of the complete data set using its own color. Only the events fulfilling the selection criteria (ranges) are displayed. Ranges are defined interactively using cursors.

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (8/11) ‏ Several selections can be defined. Each selection has its own color. Thanks to the multiple selections this zone with crossing clusters is now understandable.

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (9/11) ‏ Selections allow to make precise events choices. This single selection displayed with an appropriate dots-spacing shows clearly a cluster. Displayed with solid lines the cluttering shows up again. Adding a range clears the picture. A third range allows to show one single event outside the cluster. It would have been hard to see it with dots-spacing. A final adjustment selects precisely the cluster on 6 variables.

Bastien Dalla Piazza Summer Student Program August Parallel Coordinates Plots (10/11) ‏ Selections can be saved as TEntryList and applied to the original tree. Apply the selection to the tree via a TEntryList. nt->Draw(“u:v:w”)‏ nt->Draw(“x:y:z”)‏

Bastien Dalla Piazza Summer Student Program August Conclusion Achievements so far: Parallel coordinates representation allows the exploration of data sets with an arbitrary number of variables. Correlations between variables appear clearly when playing with the selection tools. Accurate selections can be done on noisy data sets. Further development: The dots spacing trick is not sufficient to explore data sets of 10 5 or more entries. Some ways to bypass that could be: Draw the lines with transparency. The clusters would appear as dense regions. Apply statistical cuts over the entries, to select only the similar ones. The order of the axes matters a lot, some sorting algorithms could be implemented to choose an order corresponding to the variables correlations. Automated cluster selection algorithms can also be implemented.

Bastien Dalla Piazza Summer Student Program August Aknowledgements Thanks a lot to: My supervisor Olivier Couet, My supervisor Olivier Couet, who designed these slides and provided a very nice work environnement, René Brun, René Brun, the root big boss, Summer Student Program staffs And the Summer Student Program staffs for their support. Questions?