Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Introduction to Summary Statistics
1 Chapter 1: Sampling and Descriptive Statistics.
Descriptive Statistics
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Measures of Central Tendency
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Lecture II-2: Probability Review
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 2 Describing Data with Numerical Measurements
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Summarized by Soo-Jin Kim
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University.
Numerical Descriptive Techniques
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 2 Describing Data.
Measures of Position & Exploratory Data Analysis
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
Categorical vs. Quantitative…
Measure of Central Tendency Measures of central tendency – used to organize and summarize data so that you can understand a set of data. There are three.
To be given to you next time: Short Project, What do students drive? AP Problems.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
© 2012 W.H. Freeman and Company Lecture 2 – Aug 29.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
GROUPED DATA LECTURE 5 OF 6 8.DATA DESCRIPTIVE SUBTOPIC
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Lecture 8 Data Analysis: Univariate Analysis and Data Description Research Methods and Statistics 1.
BAE 6520 Applied Environmental Statistics
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Descriptive Statistics: Numerical Methods
Descriptive Statistics (Part 2)
Principal Component Analysis (PCA)
NUMERICAL DESCRIPTIVE MEASURES
Topic 5: Exploring Quantitative data
Principal Component Analysis
Chapter 1 Warm Up .
Principal Components What matters most?.
Summary (Week 1) Categorical vs. Quantitative Variables
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Presentation transcript:

Visualizing and Exploring Data 1

Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying Relationships between Two Variables 5.Tools for Displaying More Than Two Variables 6.Principal Components Analysis 7.Multidimensional Scaling 2

Introduction Visual methods are important and ideal for sifting through data to find unexpected relationships. Exploratory data analysis is to find the structure that may indicate deeper relationships between cases or variables. 3

Summarizing Data: Some Simple Examples The measure of location Mean Median First quartile Third quartile Deciles Percentiles Mode 4

Summarizing Data: Some Simple Examples(Cont.) Suppose that x(1),x(2),…..x(n) comprise a set of n data value. Sample mean μ: true mean of population : estimate of true mean 5

Summarizing Data: Some Simple Examples(Cont.) Sample mean can minimize the sum of squared difference between it and the data values. Ex. data set{1,2,3,4,5} μ =3 μ =1 6

Summarizing Data: Some Simple Examples(Cont.) Median: The value that has equal number of data points above and below it. Ex. data set{1,2,3,4,5} Median=3 Ex. data set{1,2,3,4,5,6} Median=(3+4)/2=3.5 7

Summarizing Data: Some Simple Examples(Cont.) First quartile: The value that is greater than a quarter of data points. Third quartile: The value that is greater than three quarters of data points. Interquartile range: The difference between the third and first quartile. Range: The difference between the largest and smallest data point. 8

Summarizing Data: Some Simple Examples(Cont.) Percentiles: The value of a variable below which a certain percent of observations fall. Deciles 9

Summarizing Data: Some Simple Examples(Cont.) Mode: The value that occurs most frequently in a data set or a probability distribution Ex. data set{1,3,6,6,6,6,7,7,12,12,17} Mode=6 Ex. data set{1,1,2,4,4} Mode=1,4 10

Summarizing Data: Some Simple Examples(Cont.) Unimodal: A data set or a distribution with one mode Bimodal Multimodal 11

Summarizing Data: Some Simple Examples(Cont.) Variance If μ is replaced with then the variance is estimated as 12

Summarizing Data: Some Simple Examples(Cont.) Standard deviation 13

Summarizing Data: Some Simple Examples(Cont.) Skewness: It measures whether or not a distribution has a single long tail. A distribution is said to be right-skewed if the long tail extends in the direction of increasing values and left-skewed otherwise. Symmetric distribution have zero skewness. 14

Tools for Displaying Single Variable Histogram-1 15

Tools for Displaying Single Variable(Cont.) Histogram-2 16

Tools for Displaying Single Variable(Cont.) Kernel estimate A single variable X Have measured values {x(1),x(2),……x(n)} K():Kernel function, Gaussian curve in common h: Width 17

Tools for Displaying Single Variable(Cont.) Gaussian curve C: Normalization constant t=x-x(i) h:standard deviation 18

19

Tools for Displaying Single Variable(Cont.) Box and whisker plot 20

Tools for Displaying Relationships between Two Variables Scatterplot 21

Tools for Displaying Relationships between Two Variables(Cont.) Contour plot 22

Tools for Displaying More Than Two Variables Scatterplot matrix 23

Tools for Displaying More Than Two Variables(Cont.) Trellis plot 24

Tools for Displaying More Than Two Variables(Cont.) Star plot 25

Tools for Displaying More Than Two Variables(Cont.) Chernoff’s face 26

Tools for Displaying More Than Two Variables(Cont.) Parallel coordinates plot 27

Principal Components Analysis 28 Objective: To find vectors let data project on them to keep maximum variance. Advantage: This method can reduce the dimensions of data.

Principal Components Analysis(Cont.) 29 Suppose an n × p data matrix X that each row is a data vector x and columns represent the variables. X is mean-centered (i.e column has subtracted the sample mean for that variable )

Principal Components Analysis(Cont.) a p × 1 column vector a of projection weights and let the data vector x project along a represent that. All data vectors in X are projected on a represent that Xa is an n × 1 column vector of projected values. 30

Principal Components Analysis(Cont.) Define the variance along a as : The p × p covariance matrix of the data 31

Principal Components Analysis(Cont.) Using some constraint such that and use Lagrange multiplier to find a that maximize the variance along a. Differentiating with respect to a yields 32

Principal Components Analysis(Cont.) The first principal component a is the eigenvector associated with the largest eigenvalue of the covariance matrix V The second principal component is associated with the second largest eigenvalue and it’s direction orthogonal to the first, and so on. 33

Principal Components Analysis(Cont.) The data are projected into first k eigenvectors the variance of the projected data can be expressed as : The j th eigenvalue 34

Principal Components Analysis(Cont.) The loss of data 35

Principal Components Analysis(Cont.) Scree plot 36

Principal Components Analysis(Cont.) 37 Ex

Principal Components Analysis(Cont.) 38

Principal Components Analysis(Cont.) 39

Multidimensional Scaling Objective: To seek to represent data points in lower dimensional space while preserving,as far as is possible, the distances between the data points. 40

Multidimensional Scaling(Cont.) Classical multidimensional scaling Metric multidimensional scaling Non-metric multidimensional scaling 41

Multidimensional Scaling(Cont.) Assume an 3×2 data matrix X that the mean of each variable is zero. Then compute an 3×3 matrix B that 42

Multidimensional Scaling(Cont.) The squared Euclidean distance between object1 and 2 that 43

Multidimensional Scaling(Cont.) Define an 3×3 distance matrix D that 44

Multidimensional Scaling(Cont.) 45

Multidimensional Scaling(Cont.) 46

Multidimensional Scaling(Cont.) 47 Using Singular Value Decomposition to B that

Multidimensional Scaling(Cont.) We can choose first r eigenvalues more large than others that decide to how many dimensions we want to map. 48

Multidimensional Scaling(Cont.) Ex. Data eigenvalues distance Transformed data stress distance e-016

Multidimensional Scaling(Cont.) Stress : The observed distance between point i and j in the p-dimensional space. : The distance between points representing these objects in the two-dimensional space. Sstress 50