Chapter 2 Describing Data with Numerical Measurements

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Descriptive Measures MARE 250 Dr. Jason Turner.
Class Session #2 Numerically Summarizing Data
Measures of Dispersion
Numerically Summarizing Data
Descriptive Statistics
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics: Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Introduction to Probability and Statistics Thirteenth Edition
Descriptive Statistics – Central Tendency & Variability Chapter 3 (Part 2) MSIS 111 Prof. Nick Dedeke.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Slides by JOHN LOUCKS St. Edward’s University.
1 Pertemuan 04 Ukuran Simpangan dan Variabilitas Matakuliah: I0134 – Metode Statistika Tahun: 2007.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chapter 3, Part 1 Descriptive Statistics II: Numerical Methods
Note 4 of 5E Statistics with Economics and Business Applications Chapter 2 Describing Sets of Data Descriptive Statistics – Numerical Measures.
Introduction to Probability and Statistics Twelfth Edition
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data Using Numerical Measures
Department of Quantitative Methods & Information Systems
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Copyright ©2011 Nelson Education Limited. Describing Data with Numerical Measures CHAPTER 2.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Chapter 3 - Part B Descriptive Statistics: Numerical Methods
1 1 Slide © 2001 South-Western /Thomson Learning  Anderson  Sweeney  Williams Anderson  Sweeney  Williams  Slides Prepared by JOHN LOUCKS  CONTEMPORARYBUSINESSSTATISTICS.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Review Measures of central tendency
1 MATB344 Applied Statistics Chapter 2 Describing Data with Numerical Measures.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
1 1 Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University © 2002 South-Western/Thomson Learning.
Describing distributions with numbers
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 3 Describing Data Using Numerical Measures.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Summary Statistics: Measures of Location and Dispersion.
LIS 570 Summarising and presenting data - Univariate analysis.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Exploratory Data Analysis
Methods for Describing Sets of Data
Chapter 3 Describing Data Using Numerical Measures
Chapter 6 ENGR 201: Statistics for Engineers
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Describing Data with Numerical Measures
Honors Statistics Review Chapters 4 - 5
St. Edward’s University
Presentation transcript:

Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However, they are not always the best tool when you want to make inferences about a population from the information contained in a sample. For this purpose, it is better to use numerical measures to construct a mental picture of the data. ©1998 Brooks/Cole Publishing/ITP

Specific Topics 1. Measures of center: mean, median, and mode 2. Measures of variability: range, variance, and standard deviation 3. Tchebysheff’s Theorem and the Empirical Rule 4. Measures of relative standing: z-scores, percentiles, quartiles, and the interquartile range 5. Box plots ©1998 Brooks/Cole Publishing/ITP

2.1 and 2.2 Describing a Set of Data with Numerical Measures and Measures of Center Definition: Numerical descriptive measures associated with a population of measurements are called parameters. Those computed from sample measurements are called statistics. Definition: The arithmetic mean or average of a set is equal to the sum of the measurements divided by n. Notation: Sample mean: Population mean: m ©1998 Brooks/Cole Publishing/ITP

Figure 2.2 Character Dotplot Example 2.1 Use a dotplot to display the n = 5 measurements 2, 9, 11, 5, 6. Find the sample mean of these observations, and compare its value with what you might consider the “center” of these observations on the dotplot. Solution The dotplot in Figure 2.2 seems to be centered between 6 and 8. To find the sample mean, we can calculate The statistic = 6.6 is the balancing point or fulcrum shown on the dotplot. It does seem to mark the center of the data. Figure 2.2 Character Dotplot ©1998 Brooks/Cole Publishing/ITP

Definition: The median m of a set of n measurements is the value of x that falls in the middle position when the measurements are ordered from smallest to largest. The median is less sensitive to extreme values or outliers than the mean. The value .5(n + 1) indicates the position of the median in the ordered data set. ©1998 Brooks/Cole Publishing/ITP

n ©1998 Brooks/Cole Publishing/ITP

Figure 2.3(a) Relative frequency distribution showing the effect of extreme values on the mean and median ©1998 Brooks/Cole Publishing/ITP

Figure 2.3(b) ©1998 Brooks/Cole Publishing/ITP

The midpoint of the modal class is taken as the mode. Definition: The mode is the category that occurs most frequently, or the most frequently occurring value of x. When measurements on a continuous variable have been grouped as a frequency or relative frequency histogram, the class with the highest frequency is called the modal class. The midpoint of the modal class is taken as the mode. ©1998 Brooks/Cole Publishing/ITP

Figure 2.4(a) Relative frequency histograms for the milk data ©1998 Brooks/Cole Publishing/ITP

Figure 2.4(b) Relative frequency histograms for the GPA data ©1998 Brooks/Cole Publishing/ITP

2.3 Measures of Variability Variability or dispersion is a very important characteristic of data. See Figure 2.5 for examples of variability or dispersion of data. Definition: The range, R, of a set of n measurements is defined as the difference between the largest and the smallest measurements. See Figure 2.6 for an example of two relative frequency distributions that have the same range but very different shape and variability. Figure 2.7 shows the deviations of points from the mean. Table 2.1 shows a computation of for the data in Figure 2.7. ©1998 Brooks/Cole Publishing/ITP

Figure 2.5(a) Variability or dispersion of data ©1998 Brooks/Cole Publishing/ITP

Figure 2.5(b) Variability or dispersion of data ©1998 Brooks/Cole Publishing/ITP

Figure 2.6(a) Distributions with equal range and unequal variability ©1998 Brooks/Cole Publishing/ITP

Figure 2.6(b) Distributions with equal range and unequal variability ©1998 Brooks/Cole Publishing/ITP

Figure 2.7 Showing the deviations of points from the mean ©1998 Brooks/Cole Publishing/ITP

The population variance is denoted by s 2 and is given by the formula Definition: The variance of a population of N measurements is defined to be the average of the squares of the deviations of the measurements about their mean m. The population variance is denoted by s 2 and is given by the formula This measure will be relatively large for highly variable data and relatively small for less variable data. Definition: The variance of a sample of n measurements is defined to be the sum of the squared deviations of the measurements about their mean divided by (n -1). ©1998 Brooks/Cole Publishing/ITP

The sample variance is denoted by s 2 and is given by the formula: Definition: The standard deviation of a set of measurements is equal to the positive square root of the variance. Notation: n: number of measurements in the sample s 2: sample variance : sample standard deviation ©1998 Brooks/Cole Publishing/ITP

The shortcut method for calculating s 2 : where = sum of the squares of the individual measurements and = square of the sum of the individual measurements. ©1998 Brooks/Cole Publishing/ITP

Points to remember about variance and standard deviation: . Points to remember about variance and standard deviation: - The value of s is always greater than or equal to zero. - The larger the value of s 2 or s, the greater the variability of the data set. - If s 2 or s is equal to zero, all measurements must have the same value. - The standard deviation s is computed in order to have a measure of variability measured in the same units as the observations. ©1998 Brooks/Cole Publishing/ITP

- The interval (m ± 1s) contains approximately 68% of the measurements Empirical Rule: Given a distribution of measurements that is approximately mound-shaped: - The interval (m ± 1s) contains approximately 68% of the measurements - The interval (m ± 2s) contains approximately 95% of the measurements. - The interval (m ± 3s) contains almost all of the measurements. The Empirical Rule applies to data with a normal distribution and many other types of data. Use the Empirical Rule when the data distribution is roughly mound-shaped. ©1998 Brooks/Cole Publishing/ITP

2.6 Measures of Relative Standing Definition: The sample z score is a measure of relative standing defined by A z-score measures the distance between an observation and the mean, measured in units of standard deviation. An outlier is an unusually large or small observation. z-scores between -2 and +2 are highly likely. Z-scores exceeding 3 in absolute value are very unlikely. ©1998 Brooks/Cole Publishing/ITP

Definition: A set of n measurements on the variable x has been arranged in order of magnitude.The pth percentile is the value of x that exceeds p% of the measurements and is less than the remaining (100 - p)%. Example 2.13 Suppose you have been notified that your score of 610 on the Verbal Graduate Record Examination placed you at the 60th percentile in the distribution of scores. Where does your score of 610 stand in relation to the scores of others who took the examination? Solution Scoring at the 60th percentile means that 60% of all examina-tion scores were lower than yours and 40% were higher. ©1998 Brooks/Cole Publishing/ITP

The median is the same as the 50th percentile. The 25th and 75th percentiles are called the lower and upper quartiles. Figure 2.12 ©1998 Brooks/Cole Publishing/ITP

Figure 2.13 ©1998 Brooks/Cole Publishing/ITP

The second quartile is the median. Definition: A set of n measurements on the variable x has been arranged in order of magnitude. The lower quartile (first quartile), Q1, is the value of x that exceeds one-fourth of the measurements and is less than the remaining 3/4. The second quartile is the median. The upper quartile (third quartile), Q 3, is the value of x that exceeds three-fourths of the measurements and is less than one-fourth. ©1998 Brooks/Cole Publishing/ITP

The upper quartile, Q 3, is the value of x in the position .75(n + 1). When the measurements are arranged in order of magnitude, the lower quartile, Q1, is the value of x in the position .25(n +1). The upper quartile, Q 3, is the value of x in the position .75(n + 1). When these positions are not integers, the quartiles are found by interpolation, using the values in the two adjacent positions. Definition: The interquartile range (IQR) for a set of measurements is the difference between the upper and lower quartiles; that is, IQR = Q 3 - Q 1. ©1998 Brooks/Cole Publishing/ITP

2.7 The Box Plot From a box plot, you can quickly detect any skewness in the shape of the distribution and see whether there are any outliers in the data set. To construct a box plot: 1. Calculate the median, the upper and lower quartiles, and the IQR. 2. Draw a horizontal line representing the scale of measurement. 3. Form a box above the line with the ends at Q 1 and Q 3 . 4. Draw a vertical line through the box at the location of m, the median. ©1998 Brooks/Cole Publishing/ITP

Calculate lower and upper fences as follows: Inner fences: Q 1 - 1.5(IQR) and Q 3 + 1.5(IQR) - Measurements outside the lower and upper fences are called suspect outliers. - Whiskers extend to the largest and smallest measurements inside the fences. To finish the box plot: - Locate the largest and smallest values using the scale along the horizontal axis, and connect them to the box with horizontal lines called whiskers. - Any suspect outliers are marked with an asterisk (*). ©1998 Brooks/Cole Publishing/ITP

Figure 2. 15 shows the various values associated with the box plot Figure 2.15 shows the various values associated with the box plot. Example 2.15 exhibits the calculations for and the plotting of a box plot. Figure 2.15 Skewed distributions usually have a long whisker in the direction of the skewness, and the median line is drawn away from the direction of the skewness. ©1998 Brooks/Cole Publishing/ITP