10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,

Slides:



Advertisements
Similar presentations
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Advertisements

Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Data Analysis
Measures of Dispersion
Basic Statistical Concepts
9b. Column/Bar Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
Measures of Dispersion or Measures of Variability
1. 2 BIOSTATISTICS TOPIC 5.4 MEASURES OF DISPERSION.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Dr. Michael R. Hyman, NMSU Statistics for a Single Measure (Univariate)
Data Transformation Data conversion Changing the original form of the data to a new format More appropriate data analysis New.
Chapter 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY.
11. Multivariate Analysis CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science, IUPUI.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Today: Central Tendency & Dispersion
Quiz 2 Measures of central tendency Measures of variability.
Math 116 Chapter 12.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
APPENDIX B Data Preparation and Univariate Statistics How are computer used in data collection and analysis? How are collected data prepared for statistical.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Data Handbook Chapter 4 & 5. Data A series of readings that represents a natural population parameter A series of readings that represents a natural population.
Chapter 3 Descriptive Measures
Statistics Recording the results from our studies.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005  Department of Computer & Information Science Univariate Data Analysis.
10a. Univariate Analysis Part 1 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science,
1 Review Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central Measures (mean,
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Review Sections Descriptive Statistics –Qualitative (Graphical) –Quantitative (Graphical) –Summation Notation –Qualitative (Numerical) Central.
Measures of Dispersion
Measures of Dispersion How far the data is spread out.
9c. Line Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Determination of Sample Size: A Review of Statistical Theory
NORMAL DISTRIBUTION Normal curve Smooth, Bell shaped, bilaterally symmetrical curve Total area is =1 Mean is 0 Standard deviation=1 Mean, median, mode.
Agenda Descriptive Statistics Measures of Spread - Variability.
3 common measures of dispersion or variability Range Range Variance Variance Standard Deviation Standard Deviation.
Introduction to Statistics Santosh Kumar Director (iCISA)
STATISTICS. What is the difference between descriptive and inferential statistics? Descriptive Statistics: Describe data Help us organize bits of data.
Central Tendency & Dispersion
Statistical Analysis Quantitative research is first and foremost a logical rather than a mathematical (i.e., statistical) operation Statistics represent.
Part II Sigma Freud and Descriptive Statistics Chapter 3 Vive La Différence: Understanding Variability.
Introduction to Statistics Measures of Central Tendency and Dispersion.
LIS 570 Summarising and presenting data - Univariate analysis.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
Chapter 4 Exploring Chemical Analysis, Harris
Number of hurricanes that occurred each year from 1944 through 2000 as reported by Science magazine Histogram Dot plot Box plot.
9d. Pie Charts CSCI N207 Data Analysis Using Spreadsheet Department of Computer and Information Science, IUPUI Lingma Acheson
Probability and Statistics 12/11/2015. Statistics Review/ Excel: Objectives Be able to find the mean, median, mode and standard deviation for a set of.
Economics 111Lecture 7.2 Quantitative Analysis of Data.
Standard Deviation Variance and Range. Standard Deviation:  Typical distance of observations from their mean  A numerical summary that measures the.
2.4 Measures of Variation The Range of a data set is simply: Range = (Max. entry) – (Min. entry)
AP Biology Resources Statistical Analysis and Graphing.
Measures of Dispersion
Descriptive measures Capture the main 4 basic Ch.Ch. of the sample distribution: Central tendency Variability (variance) Skewness kurtosis.
Measures of Central Tendency
Summary of Prev. Lecture
Univariate Statistics
Univariate Analysis/Descriptive Statistics
BUS7010 Quant Prep Statistics in Business and Economics
Descriptive Statistics: Describing Data
Presentation transcript:

10b. Univariate Analysis Part 2 CSCI N207 Data Analysis Using Spreadsheet Lingma Acheson Department of Computer and Information Science, IUPUI

The Range Difference between minimum and maximum values in a data set Larger range usually (but not always) indicates a large spread or deviation in the values of the data set. (73, 66, 69, 67, 49, 60, 81, 71, 78, 62, 53, 87, 74, 65, 74, 50, 85, 45, 63, 100) Range : 100 – 45 = 55 Some extreme low or high value might throw off the range, e.g. (20, 76, 77, 80, 82, 82, 84, 88, 90, 93, 99, 100) Range: 100 – 20 = 80

Variance One measure of dispersion (deviation from the mean) of a data set. How far away is each data from the mean? Variance – average distance to the mean The larger the variance, the greater is the average deviation of each datum from the mean (more numbers are away from the mean). E.g. 73, 67, 70, 67, 49, 60, 81, 71, 78, 62, 53, 87, 72, 65, 74, 50, 84, 45, 62,100 Variance = (( ) 2 +( ) 2 +( ) 2 + … +( ) 2 )/20 Variance = Average value of the data set Excel Functions: VARP() – variance for the whole population (data set is complete) VAR() – variance from a sample population (data set is a sample)

Standard Deviation Square root of the variance, as the variance gets the square of the distance. The magnitude of the number is more in line with the values in the data set. Can be thought of as the average deviation from the mean of a data set. Standard Deviation = Excel Functions: STDEVP() – use this when the data set is complete STDEV() – use this when the data set is a sample

Frequency Tables Use frequency table to observe the distribution E.g. Consider the following data set: {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100} Need to determine how to group data into different bins. Category LabelsFrequency >901

Histogram A histogram is simply a column chart of the frequency table. Category Labels Frequency >901 Page 6

Data Distribution Category Labels Frequency >901

Normal Distributions The Bell curve –Symmetrical –Mean ≈ Median

Skewed Distributions Most of the times the distributions are skewed. Positively skewed distribution: mean > median Negatively skewed distribution: mean < median

Average (68.6) and Median (68) Mode (74) -1SD+1SD Data Distribution {45, 49, 50, 53, 60, 62, 63, 65, 66, 67, 69, 71, 73, 74, 74, 78, 81, 85, 87, 100}

Standard Deviation With a normal distribution: mean + 1*SD covers 68% of data mean + 2*SD covers 95% of data mean + 3*SD covers 99.7% of data Page 11