STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Measures of Dispersion
Statistics for Managers using Microsoft Excel 6th Edition
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Basic Business Statistics (10th Edition)
© 2002 Prentice-Hall, Inc.Chap 3-1 Basic Business Statistics (8 th Edition) Chapter 3 Numerical Descriptive Measures.
Chapter 3 Describing Data Using Numerical Measures
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 3-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
Statistics for Managers Using Microsoft® Excel 4th Edition
Slides by JOHN LOUCKS St. Edward’s University.
1 Business 260: Managerial Decision Analysis Professor David Mease Lecture 1 Agenda: 1) Course web page 2) Greensheet 3) Numerical Descriptive Measures.
1 Pertemuan 02 Ukuran Numerik Deskriptif Matakuliah: I0262-Statistik Probabilitas Tahun: 2007.
Basic Business Statistics 10th Edition
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Statistics for Managers using Microsoft Excel 6th Global Edition
Business and Economics 7th Edition
Coefficient of Variation
© 2003 Prentice-Hall, Inc.Chap 3-1 Business Statistics: A First Course (3 rd Edition) Chapter 3 Numerical Descriptive Measures.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Describing Data Using Numerical Measures
Numerical Descriptive Measures
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Modified by ARQ, from © 2002 Prentice-Hall.Chap 3-1 Numerical Descriptive Measures Chapter %20ppts/c3.ppt.
Descriptive Statistics Roger L. Brown, Ph.D. Medical Research Consulting Middleton, WI Online Course #1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Descriptive Statistics1 LSSG Green Belt Training Descriptive Statistics.
Lecture 3 Describing Data Using Numerical Measures.
Applied Quantitative Analysis and Practices LECTURE#09 By Dr. Osman Sadiq Paracha.
Variation This presentation should be read by students at home to be able to solve problems.
BUS304 – Data Charaterization1 Other Numerical Measures  Median  Mode  Range  Percentiles  Quartiles, Interquartile range.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
Basic Business Statistics Chapter 3: Numerical Descriptive Measures Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures (Summary Measures) Basic Business Statistics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Basic Business Statistics 11 th Edition.
Descriptive Statistics(Summary and Variability measures)
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Chapter 2 Describing Data: Numerical
Yandell – Econ 216 Chap 3-1 Chapter 3 Numerical Descriptive Measures.
Descriptive Statistics ( )
Statistics for Managers Using Microsoft® Excel 5th Edition
Business and Economics 6th Edition
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
Description of Data (Summary and Variability measures)
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
BUS173: Applied Statistics
Numerical Descriptive Measures
Business and Economics 7th Edition
Presentation transcript:

STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures

After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Compute the range, variance, and standard deviation and know what these values mean Construct and interpret a box and whiskers plot Compute and explain the coefficient of variation and z scores Use numerical measures along with graphs, charts, and tables to describe data Chapter Goals

Summary Measures Center and Location Mean Median Mode Other Measures of Location Weighted Mean Describing Data Numerically Variation Variance Standard Deviation Coefficient of Variation Range Percentiles Interquartile Range Quartiles GeoMean RMS

Measures of Center and Location Center and Location MeanMedian ModeWeighted Mean Overview GeomeanRMS

Mean (Arithmetic Average) The Mean is the arithmetic average of data values Sample mean Population mean n = Sample Size N = Population Size

Mean (Arithmetic Average) The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers) (continued) Mean = Mean = 4

Median Not affected by extreme values In an ordered array, the median is the “middle” number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the two middle numbers Median = Median = 3

Mode A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes Mode = No Mode

Geometric Mean A measure of central tendency Only used for positive, numerical data Same as IRR Always less than or equal to Mean

Root Mean Square (RMS) A measure of central tendency Measures physical activity, not affected by negative values There may may be no mode There may be an infinite number of modes

Weighted Mean Used when values are grouped by frequency or relative importance Days to Complete Frequency Example: Sample of 26 Repair Projects Weighted Mean Days to Complete:

Central Measure Example

Summary Statistics Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000 House Prices: $2,000, , , , ,000 Sum 3,000,000

Mean is generally used, unless extreme values (outliers) exist Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be reported for a region – less sensitive to outliers Which measure of location is the “best”?

Shape of a Distribution Describes how data is distributed Symmetric or skewed Mean = Median = Mode Mean < Median < Mode Mode < Median < Mean Right-Skewed Left-Skewed Symmetric (Longer tail extends to left)(Longer tail extends to right)

Other Location Measures Other Measures of Location PercentilesQuartiles 1 st quartile = 25 th percentile 2 nd quartile = 50 th percentile = median 3 rd quartile = 75 th percentile The p th percentile in a data array: p% are less than or equal to this value (100 – p)% are greater than or equal to this value (where 0 ≤ p ≤ 100)

Quartiles Quartiles split the ranked data into 4 equal groups 25% Sample Data in Ordered Array: Example: Find the first quartile (n = 9) Q1 = 25 th percentile, so find the (9+1) = 2.5 position so use the value half way between the 2 nd and 3 rd values, so Q1 = Q1Q2Q3

Percentiles The p th percentile in an ordered array of n values is the value in i th position, where Example: The 60 th percentile in an ordered array of 19 values is the value in 12 th position:

Box and Whisker Plot A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: 25% 25%

Shape of Box and Whisker Plots The Box and central line are centered between the endpoints if data is symmetric around the median A Box and Whisker plot can be shown in either vertical or horizontal format

Distribution Shape and Box and Whisker Plot Right-SkewedLeft-SkewedSymmetric Q1Q2Q3Q1Q2Q3 Q1Q2Q3

Box-and-Whisker Plot Example Below is a Box-and-Whisker plot for the following data: This data is very right skewed, as the plot depicts Min Q1 Q2 Q3 Max

Measures of Variation Variation Variance Standard DeviationCoefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range

Measures of variation give information on the spread or variability of the data values. Variation Same center, different variation

Range Simplest measure of variation Difference between the largest and the smallest observations: Range = x maximum – x minimum Range = = 13 Example:

Ignores the way in which data are distributed Sensitive to outliers Range = = Range = = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = = 4 Range = = 119

Interquartile Range Can eliminate some outlier problems by using the interquartile range Eliminate some high-and low-valued observations and calculate the range from the remaining values. Interquartile range = 3 rd quartile – 1 st quartile

Interquartile Range Median (Q2) X maximum X minimum Q1Q3 Example: 25% 25% Interquartile range = 57 – 30 = 27

“Outliers” 1.5 IQR Criterion IQR= Q3 – Q1 Q IQR Q IQR “2-Sigma” Criterion (2  )

Average of squared deviations of values from the mean Sample variance: Population variance: Variance

Standard Deviation Most commonly used measure of variation Shows variation about the mean Has the same units as the original data Sample standard deviation: Population standard deviation:

Calculation Example: Sample Standard Deviation Sample Data (X i ) : n = 8 Mean = x = 16

Comparing Standard Deviations Mean = 15.5 s = Data B Data A Mean = 15.5 s = Mean = 15.5 s = 4.57 Data C

Coefficient of Variation Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units Population Sample

Comparing Coefficient of Variation Stock A: Average price last year = $50 Standard deviation = $5 Stock B: Average price last year = $100 Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price

If the data distribution is bell-shaped, then the interval: contains about 68% of the values in the population or the sample The Empirical Rule 68%

contains about 95% of the values in the population or the sample contains about 99.7% of the values in the population or the sample The Empirical Rule 99.7%95%

Regardless of how the data are distributed, at least (1 - 1/k 2 ) of the values will fall within k standard deviations of the mean Examples: (1 - 1/1 2 ) = 0% ……..... k=1 (μ ± 1σ) (1 - 1/2 2 ) = 75% … k=2 (μ ± 2σ) (1 - 1/3 2 ) = 89% ………. k=3 (μ ± 3σ) Tchebysheff’s Theorem withinAt least

A standardized data value refers to the number of standard deviations a value is from the mean Standardized data values are sometimes referred to as z-scores Standardized Data Values

where: x = original data value μ = population mean σ = population standard deviation z = standard score (number of standard deviations x is from μ) Standardized Population Values

where: x = original data value x = sample mean s = sample standard deviation z = standard score (number of standard deviations x is from μ) Standardized Sample Values

Using Microsoft Excel Descriptive Statistics are easy to obtain from Microsoft Excel Use menu choice: tools / data analysis / descriptive statistics Enter details in dialog box

Excel output Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000, , , , ,000

Chapter Summary Described measures of center and location Mean, median, mode, geometric mean, midrange Discussed percentiles and quartiles Described measure of variation Range, interquartile range, variance, standard deviation, coefficient of variation Created Box and Whisker Plots

Chapter Summary Illustrated distribution shapes Symmetric, skewed Discussed Tchebysheff’s Theorem Calculated standardized data values (continued)