Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.

Slides:



Advertisements
Similar presentations
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Advertisements

Measures of Dispersion
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slides by JOHN LOUCKS St. Edward’s University.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
QBM117 Business Statistics
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Chapter 2 Describing Data with Numerical Measurements
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Chapter 12: Describing Distributions with Numbers We create graphs to give us a picture of the data. We also need numbers to summarize the center and spread.
Objectives 1.2 Describing distributions with numbers
Numerical Descriptive Techniques
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Methods for Describing Sets of Data
Chapter 3 Averages and Variations
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc. 1 Chapter 4 Numerical Methods for Describing Data.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Table of Contents 1. Standard Deviation
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Chapter 2 Describing Data.
Section 3.3 Measures of Relative Position HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems,
Describing distributions with numbers
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
14.1 Data Sets: Data Sets: Data set: collection of data values.Data set: collection of data values. Frequency: The number of times a data entry occurs.Frequency:
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measures of Dispersion How far the data is spread out.
INVESTIGATION 1.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
1 Chapter 4 Numerical Methods for Describing Data.
Chapter 5 Describing Distributions Numerically.
Summary Statistics: Measures of Location and Dispersion.
Using Measures of Position (rather than value) to Describe Spread? 1.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Chapter 4 – Measurements of Location and Position Math 22 Introductory Statistics.
Unit 4: Probability Day 4: Measures of Central Tendency and Box and Whisker Plots.
Unit 3: Averages and Variations Part 3 Statistics Mr. Evans.
Descriptive Statistics ( )
Exploratory Data Analysis
Chapter 16: Exploratory data analysis: numerical summaries
a graphical presentation of the five-number summary of data
Chapter 3 Describing Data Using Numerical Measures
Chapter 5 : Describing Distributions Numerically I
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics
Box and Whisker Plots Algebra 2.
Percentiles and Box-and- Whisker Plots
Chapter 5: Describing Distributions Numerically
Cronnelly.
Measures of Central Tendency
Day 52 – Box-and-Whisker.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Box Plots CCSS 6.7.
Presentation transcript:

Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements. Two important characteristics of the set are its center and how “spread out” it is.

Measures of Center The mode of a set of data is the observation, or observations, which occur most often The Mean = Sum of all values/number of observations Median middle observation after the observations are ordered smallest to largest.

Median Order the data smallest to largest. If there are an odd number (n) of observations in a data set, the median is the (n+1)/2 value If there is an even number of observations, take the average (mean) of the two numbers closest to the middle. The Excel command Median can also be used.

Mode(s) of a data set The mode of a set of data is the value that occurs most often. Sometimes this number is unique but other times it is not. The mode of the set {1, 2, 2, 3} is 2. While the set {1, 2, 3} has three modes 1, 2, and 3. The Excel command Mode can be used to calculate the mode of a set of numbers but it will only find one mode not multiple modes.

Geometric Descriptions Given a histogram (or probability density curve) the mode(s) is(are) the highest point(s) on the curve. The median is point at which the area under the curve to the right of the median is equal to the area under the curve to the left of the median. The mean is the point at which a cut-out of the curve would balance.

The Mean is the Balance point

Measures of Spread: Range The simplest way to describe how spread-out a data set is, is to calculate the largest value minus the smallest value. The result is called the range of the data set. For example, {1, 2, 6, 8, 9} has a range of 9-1=8.

Percentiles The p-th percentile of a set of data is the value for which, at most, p% of the data points are less than that value, and at most (100-p)% of the data points are greater than that value.

Percentiles as area

For example, consider the 8 numbers 1, -1, 2, -2, 3, -3, 4,-4 To find the 30th percentile, we first order the data smallest to largest. -4, -3, -2, -1, 1, 2, 3, 4. Next, we calculate the product of the sample size (8) and the desired proportion 0.3 (30%). The result is 2.4. Since 2.4 is not an integer, we take the smallest integer larger than 2.4, 3. Therefore the third number from the left in the list above, -2, is the 30th percentile.

Calculation of Percentiles To calculate the pth percentile of a set of data: 1. Order the data smallest to largest. 2. Multiply the size of the sample by the desired percentage represented as a decimal. For example, if the data set contains 50 points, to find the 30th percentile we multiply 50 * .30 = 15. 3. If this number (sample size times percentage) is not an integer, round it up to the next integer and find the corresponding data value from the ordered list. If this number is an integer, locate the corresponding data value from the ordered list. Take the average of this data value with the next larger data value on the ordered list.

Percentiles in Excel Excel calculates percentiles in a non-standard fashion by treating the data as continuous rather than discrete. The difference is mainly that the percentiles calculated by Excel are rarely points in the data set, whereas the method previously described always results in the percentile being one of the data points or half way between two adjacent points. We therefore recommend not using the percentile worksheet function in Excel.

Quartiles The 25th, 50th, and 75th percentiles are also known as the first, second and third quartiles of the data set. To calculate these values we recommend using the Excel to find the median which divides the data in half, and then finding the median of the lower and upper halves.

Quartiles For example to find the quartiles of the data set 1,2,3,4,5,6,7,8. The median is 4.5 this is the second quartile. Taking the median of the lower four numbers gives the first quartile 2.5. Taking the median of the upper four numbers gives the 6.5. Note these are close to, but not equal to, the values returned by running the Excel worksheet function quartile (which is special case of Excel’s percentile function).

Measuring Spread: Interquartile Range The interquartile range is the distance between the first and third quartiles. Note this actually shows the range of the middle half of the data. This measure is mainly used to describe data sets that have a few scattered points far away from a central group. For example the set below has a range of 125 suggesting the data has more spread than is actually the case. {1,1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 127}

Quartiles and Box Plots (Five number summaries) A box plot is a graphical representation showing the lowest value, the first, second and third quartile and the largest value. These five numbers are represented by drawing a box from the first to the third quartile, with a vertical line at the median, and then extending “whiskers” to the smallest and largest values.

Box Plot

Box Plots in Excel Excel does not have a built in function to construct box plots but there is a worksheet called boxplot.xls that you can use to construct box plots from Excel data. This worksheet makes use of one added convention: elimination of outliers. As we have seen some data sets have scattered points that are far from the center of the data. If these points are more than 1.5*interquartile range they are considered “outliers” and are shown as single points unconnected to the box on a boxplot.

Results from Boxplot.xls 2.67 5.34 8.01 10.68 13.35 16.02 18.69 21.36 24.03 26.7 No Outliers Outliers

Measures of Spread: Population Variance To calculate the variance of a population, first calculate the population mean, , then measure the distance between each observation and the mean, square these distances , then finally calculate the average of these squared distances by summing then dividing by the number of observations.

Standard deviation The population standard deviation , is calculated by taking the square root of the population variance . Or by using the Excel command =stdev( reference cells) Generally the standard deviation is about one fourth to one sixth of the range. Although it is difficult to calculate it by hand, we shall see it is by far the most important measure of spread of a data set.