Download presentation
Presentation is loading. Please wait.
Published byGerard Fisher Modified over 9 years ago
1
Visualizing and Exploring Data 1
2
Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying Relationships between Two Variables 5.Tools for Displaying More Than Two Variables 6.Principal Components Analysis 7.Multidimensional Scaling 2
3
Introduction Visual methods are important and ideal for sifting through data to find unexpected relationships. Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables. 3
4
Summarizing Data: Some Simple Examples The measure of location Mean Median First quartile Third quartile Deciles Percentiles Mode 4
5
Summarizing Data: Some Simple Examples(Cont.) Suppose that x(1),x(2),…..x(n) comprise a set of n data value. Sample mean μ: true mean of population : estimate of true mean 5
6
Summarizing Data: Some Simple Examples(Cont.) Sample mean can minimize the sum of squared difference between it and the data values. Ex. data set{1,2,3,4,5} μ =3 μ =1 6
7
Summarizing Data: Some Simple Examples(Cont.) Median: The value that has equal number of data points above and below it. Ex. data set{1,2,3,4,5} Median=3 Ex. data set{1,2,3,4,5,6} Median=(3+4)/2=3.5 7
8
Summarizing Data: Some Simple Examples(Cont.) First quartile: The value that is greater than a quarter of data points. Third quartile: The value that is greater than three quarters of data points. Interquartile range: The difference between the third and first quartile. Range: The difference between the largest and smallest data point. 8
9
Summarizing Data: Some Simple Examples(Cont.) Percentiles: The value of a variable below which a certain percent of observations fall. Deciles 9
10
Summarizing Data: Some Simple Examples(Cont.) Mode: The value that occurs most frequently in a data set or a probability distribution Ex. data set{1,3,6,6,6,6,7,7,12,12,17} Mode=6 Ex. data set{1,1,2,4,4} Mode=1,4 10
11
Summarizing Data: Some Simple Examples(Cont.) Unimodal: A data set or a distribution with one mode Bimodal Multimodal 11
12
Summarizing Data: Some Simple Examples(Cont.) Variance If μ is replaced with then the variance is estimated as 12
13
Summarizing Data: Some Simple Examples(Cont.) Standard deviation 13
14
Summarizing Data: Some Simple Examples(Cont.) Skewness: It measures whether or not a distribution has a single long tail. A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness. 14
15
Tools for Displaying Single Variable Histogram-1 15
16
Tools for Displaying Single Variable(Cont.) Histogram-2 16
17
Tools for Displaying Single Variable(Cont.) Kernel estimate A single variable X Have measured values {x(1),x(2),……x(n)} K():Kernel function, Gaussian curve in common h: Width 17
18
Tools for Displaying Single Variable(Cont.) Gaussian curve C: Normalization constant t=x-x(i) h:standard deviation 18
19
19
20
Tools for Displaying Single Variable(Cont.) Box and whisker plot 20
21
Tools for Displaying Relationships between Two Variables Scatterplot 21
22
Tools for Displaying Relationships between Two Variables(Cont.) Contour plot 22
23
Tools for Displaying More Than Two Variables Scatterplot matrix 23
24
Tools for Displaying More Than Two Variables(Cont.) Trellis plot 24
25
Tools for Displaying More Than Two Variables(Cont.) Star plot 25
26
Tools for Displaying More Than Two Variables(Cont.) Chernoff’s face 26
27
Tools for Displaying More Than Two Variables(Cont.) Parallel coordinates plot 27
28
Principal Components Analysis 28 Objective: To find vectors let data project on them to keep maximum variance. Advantage: This method can reduce the dimensions of data.
29
Principal Components Analysis(Cont.) 29 Suppose an n × p data matrix X that each row is a data vector x and columns represent the variables. X is mean-centered (i.e column has subtracted the sample mean for that variable )
30
Principal Components Analysis(Cont.) a p × 1 column vector a of projection weights and let the data vector x project along a represent that. All data vectors in X are projected on a represent that Xa is an n × 1 column vector of projected values. 30
31
Principal Components Analysis(Cont.) Define the variance along a as : The p × p covariance matrix of the data 31
32
Principal Components Analysis(Cont.) Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a. Differentiating with respect to a yields 32
33
Principal Components Analysis(Cont.) The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first, and so on. 33
34
Principal Components Analysis(Cont.) The data are projected into first k eigenvectors the variance of the projected data can be expressed as : The j th eigenvalue 34
35
Principal Components Analysis(Cont.) The loss of data 35
36
Principal Components Analysis(Cont.) Scree plot 36
37
Principal Components Analysis(Cont.) 37 Ex. 269.838.950.5 272.439.550.0 272.039.350.2 268.238.650.2 268.238.650.8 267.038.251.1 267.838.451.0 273.639.650.0 271.239.150.4 270.038.950.5
38
Principal Components Analysis(Cont.) 38
39
Principal Components Analysis(Cont.) 39
40
Multidimensional Scaling Objective: To seek to represent data points in lower dimensional space while preserving,as far as is possible, the distances between the data points. 40
41
Multidimensional Scaling(Cont.) Classical multidimensional scaling Metric multidimensional scaling Non-metric multidimensional scaling 41
42
Multidimensional Scaling(Cont.) Assume an 3×2 data matrix X that the mean of each variable is zero. Then compute an 3×3 matrix B that 42
43
Multidimensional Scaling(Cont.) The squared Euclidean distance between object1 and 2 that 43
44
Multidimensional Scaling(Cont.) Define an 3×3 distance matrix D that 44
45
Multidimensional Scaling(Cont.) 45
46
Multidimensional Scaling(Cont.) 46
47
Multidimensional Scaling(Cont.) 47 Using Singular Value Decomposition to B that
48
Multidimensional Scaling(Cont.) We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map. 48
49
Multidimensional Scaling(Cont.) Ex. Data eigenvalues distance Transformed data stress distance 49 128 345 569 16.9641 7.7025 0 -2.46211.5436 -0.7528-2.2085 3.21490.6649 04.12315.7446 4.123104.8990 5.74464.89900 04.12315.7446 4.123104.8990 5.74464.89900 1.0325e-016
50
Multidimensional Scaling(Cont.) Stress : The observed distance between point i and j in the p-dimensional space. : The distance between points representing these objects in the two-dimensional space. Sstress 50
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.