Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Visualizing and Exploring Data 1

Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying Relationships between Two Variables 5.Tools for Displaying More Than Two Variables 6.Principal Components Analysis 7.Multidimensional Scaling 2

Introduction Visual methods are important and ideal for sifting through data to find unexpected relationships. Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables. 3

Summarizing Data: Some Simple Examples The measure of location Mean Median First quartile Third quartile Deciles Percentiles Mode 4

Summarizing Data: Some Simple Examples(Cont.) Suppose that x(1),x(2),…..x(n) comprise a set of n data value. Sample mean μ: true mean of population : estimate of true mean 5

Summarizing Data: Some Simple Examples(Cont.) Sample mean can minimize the sum of squared difference between it and the data values. Ex. data set{1,2,3,4,5} μ =3 μ =1 6

Summarizing Data: Some Simple Examples(Cont.) Median: The value that has equal number of data points above and below it. Ex. data set{1,2,3,4,5} Median=3 Ex. data set{1,2,3,4,5,6} Median=(3+4)/2=3.5 7

Summarizing Data: Some Simple Examples(Cont.) First quartile: The value that is greater than a quarter of data points. Third quartile: The value that is greater than three quarters of data points. Interquartile range: The difference between the third and first quartile. Range: The difference between the largest and smallest data point. 8

Summarizing Data: Some Simple Examples(Cont.) Percentiles: The value of a variable below which a certain percent of observations fall. Deciles 9

Summarizing Data: Some Simple Examples(Cont.) Mode: The value that occurs most frequently in a data set or a probability distribution Ex. data set{1,3,6,6,6,6,7,7,12,12,17} Mode=6 Ex. data set{1,1,2,4,4} Mode=1,4 10

Summarizing Data: Some Simple Examples(Cont.) Unimodal: A data set or a distribution with one mode Bimodal Multimodal 11

Summarizing Data: Some Simple Examples(Cont.) Variance If μ is replaced with then the variance is estimated as 12

Summarizing Data: Some Simple Examples(Cont.) Standard deviation 13

Summarizing Data: Some Simple Examples(Cont.) Skewness: It measures whether or not a distribution has a single long tail. A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness. 14

Tools for Displaying Single Variable Histogram-1 15

Tools for Displaying Single Variable(Cont.) Histogram-2 16

Tools for Displaying Single Variable(Cont.) Kernel estimate A single variable X Have measured values {x(1),x(2),……x(n)} K():Kernel function, Gaussian curve in common h: Width 17

Tools for Displaying Single Variable(Cont.) Gaussian curve C: Normalization constant t=x-x(i) h:standard deviation 18

Tools for Displaying Single Variable(Cont.) Box and whisker plot 20

Tools for Displaying Relationships between Two Variables Scatterplot 21

Tools for Displaying Relationships between Two Variables(Cont.) Contour plot 22

Tools for Displaying More Than Two Variables Scatterplot matrix 23

Tools for Displaying More Than Two Variables(Cont.) Trellis plot 24

Tools for Displaying More Than Two Variables(Cont.) Star plot 25

Tools for Displaying More Than Two Variables(Cont.) Chernoff’s face 26

Tools for Displaying More Than Two Variables(Cont.) Parallel coordinates plot 27

Principal Components Analysis 28 Objective: To find vectors let data project on them to keep maximum variance. Advantage: This method can reduce the dimensions of data.

Principal Components Analysis(Cont.) 29 Suppose an n × p data matrix X that each row is a data vector x and columns represent the variables. X is mean-centered (i.e column has subtracted the sample mean for that variable )

Principal Components Analysis(Cont.) a p × 1 column vector a of projection weights and let the data vector x project along a represent that. All data vectors in X are projected on a represent that Xa is an n × 1 column vector of projected values. 30

Principal Components Analysis(Cont.) Define the variance along a as : The p × p covariance matrix of the data 31

Principal Components Analysis(Cont.) Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a. Differentiating with respect to a yields 32

Principal Components Analysis(Cont.) The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first, and so on. 33

Principal Components Analysis(Cont.) The data are projected into first k eigenvectors the variance of the projected data can be expressed as : The j th eigenvalue 34

Principal Components Analysis(Cont.) The loss of data 35

Principal Components Analysis(Cont.) Scree plot 36

Principal Components Analysis(Cont.) 37 Ex. 269.838.950.5 272.439.550.0 272.039.350.2 268.238.650.2 268.238.650.8 267.038.251.1 267.838.451.0 273.639.650.0 271.239.150.4 270.038.950.5

Principal Components Analysis(Cont.) 38

Principal Components Analysis(Cont.) 39

Multidimensional Scaling Objective: To seek to represent data points in lower dimensional space while preserving,as far as is possible, the distances between the data points. 40

Multidimensional Scaling(Cont.) Classical multidimensional scaling Metric multidimensional scaling Non-metric multidimensional scaling 41

Multidimensional Scaling(Cont.) Assume an 3×2 data matrix X that the mean of each variable is zero. Then compute an 3×3 matrix B that 42

Multidimensional Scaling(Cont.) The squared Euclidean distance between object1 and 2 that 43

Multidimensional Scaling(Cont.) Define an 3×3 distance matrix D that 44

Multidimensional Scaling(Cont.) 45

Multidimensional Scaling(Cont.) 46

Multidimensional Scaling(Cont.) 47 Using Singular Value Decomposition to B that

Multidimensional Scaling(Cont.) We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map. 48

Multidimensional Scaling(Cont.) Ex. Data eigenvalues distance Transformed data stress distance 49 128 345 569 16.9641 7.7025 0 -2.46211.5436 -0.7528-2.2085 3.21490.6649 04.12315.7446 4.123104.8990 5.74464.89900 04.12315.7446 4.123104.8990 5.74464.89900 1.0325e-016

Multidimensional Scaling(Cont.) Stress : The observed distance between point i and j in the p-dimensional space. : The distance between points representing these objects in the two-dimensional space. Sstress 50

Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Similar presentations

Presentation on theme: "Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Similar presentations

Presentation on theme: "Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying."— Presentation transcript:

Similar presentations

About project

Feedback