MEASURES OF DISPERSION

Slides:



Advertisements
Similar presentations
Brought to you by Tutorial Support Services The Math Center.
Advertisements

Lecture 2 Part a: Numerical Measures
Class Session #2 Numerically Summarizing Data
The mean for quantitative data is obtained by dividing the sum of all values by the number of values in the data set.
Measures of Dispersion
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
NUMERICAL DESCRIPTIVE MEASURES
NUMERICAL DESCRIPTIVE MEASURES
Calculating & Reporting Healthcare Statistics
Presentation on Statistics for Research Lecture 7.
Learning Objectives for Section 11.3 Measures of Dispersion
Learning Objectives In this chapter you will learn about the importance of variation how to measure variation range variance standard deviation.
DESCRIPTIVE MEASURES.
Describing Data: Numerical Measures
Grouped Data Calculation
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Describing Data: Numerical
1 Tendencia central y dispersión de una distribución.
Department of Quantitative Methods & Information Systems
Describing distributions with numbers
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Chapter 3 Descriptive Measures
BUS250 Seminar 4. Mean: the arithmetic average of a set of data or sum of the values divided by the number of values. Median: the middle value of a data.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
8.3 Measures of Dispersion  In this section, you will study measures of variability of data. In addition to being able to find measures of central tendency.
Descriptive Statistics: Numerical Methods
Describing distributions with numbers
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Describing Data: Numerical Measures. GOALS 1.Calculate the arithmetic mean, weighted mean, median, mode, and geometric mean. 2.Explain the characteristics,
Descriptive Statistics: Presenting and Describing Data.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
Measures of Central Tendency. These measures indicate a value, which all the observations tend to have, or a value where all the observations can be assumed.
Presentation on Statistics for Research Lecture 7.
CHAPTER 3 : DESCRIPTIVE STATISTIC : NUMERICAL MEASURES (STATISTICS)
Copyright © 2012 Pearson Education, Inc. All rights reserved Chapter 9 Statistics.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
FARAH ADIBAH ADNAN ENGINEERING MATHEMATICS INSTITUTE (IMK) C HAPTER 1 B ASIC S TATISTICS.
LIS 570 Summarising and presenting data - Univariate analysis.
Chapter 11 Data Descriptions and Probability Distributions Section 3 Measures of Dispersion.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
CHAPTER 2: Basic Summary Statistics
CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Describing Data: Numerical Measures Chapter 3.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Chapter 3 Numerical Descriptive Measures. 3.1 Measures of central tendency for ungrouped data A measure of central tendency gives the center of a histogram.
Central Tendency Quartiles and Percentiles (الربيعيات والمئينات)
MEASURE of CENTRAL TENDENCY of UNGROUPED DATA
NUMERICAL DESCRIPTIVE MEASURES
Descriptive Statistics: Presenting and Describing Data
Numerical Measures: Centrality and Variability
NUMERICAL DESCRIPTIVE MEASURES
Characteristics of the Mean
MEASURES OF CENTRAL TENDENCY
Describing Data: Numerical Measures
NUMERICAL DESCRIPTIVE MEASURES (Part B)
Numerical Descriptive Measures
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
CHAPTER 2: Basic Summary Statistics
NUMERICAL DESCRIPTIVE MEASURES
Presentation transcript:

MEASURES OF DISPERSION

MEASURES OF DISPERSION The measures of central tendency, such as the mean, median and mode, do not reveal the whole picture of the distribution of a data set. Two data sets with the same mean may have completely different spreads. The variation among the values of observations for one data set may be much larger or smaller than for the other data set. NOTE: the words dispersion, spread and variation have the same meaning.

MEASURES OF DISPERSION: example Consider the following two data sets on the ages of all workers in each of two small companies. Company 1: 47 38 35 40 36 45 39 Company 2: 70 33 18 52 27 The mean age of workers in both these companies is the same: 40 years. By knowing only these means, we may deduce that the workers have a similar age distribution in the two companies. But, the variation in the workers’ age is very different for each of these two companies. Company 1 36 39 35 38 40 45 47 It has a much larger variation than ages of the workers in the first company Company 2 18 27 33 52 70

MEASURES OF DISPERSION The mean, median or mode is usually not by itself a sufficient measure to reveal the shape of a distribution of a data set. We also need a measure that can provide some information about the variation among data set values. The measures that help us to know about the spread of a data set are called measures of dispersion. The measures of central tendency and dispersion taken together give a better picture of a data set. We consider 3 measures of dispersion: Range Variance Standard Deviation

RANGE = LARGEST VALUE – SMALLEST VALUE Definition the range is the simplest measure of dispersion and it is obtained by taking the difference between the largest and the smallest values in a data set: RANGE = LARGEST VALUE – SMALLEST VALUE

Total Area (square miles) RANGE: example The following data set gives the total areas in square miles of the 4 western South-Central states of the United States. State Total Area (square miles) Arkansas Louisiana Oklahoma Texas 53,182 49,651 69,903 267,277 RANGE = LARGEST VALUE – SMALLEST VALUE = 267,277 – 49,651 = 217,626 square miles Thus, the total areas of these four states are spread over a range of 217,626 square miles.

RANGE: disadvantages The range, like the mean has the disadvantage of being influenced by outliers. Consequently, it is not a good measure of dispersion to use for data set containing outliers. The calculation of the range is based on two values only: the largest and the smallest. All other values in a data set are ignored. Thus, the range is not a very satisfactory measure of dispersion and it is, in fact, rarely used.

VARIANCE Definition The variance is a measure of dispersion of values based on their deviation from the mean. The variance is defined to be: for a population for a sample

VARIANCE ( or ) is called dispersion from the mean. The difference between an observation and the mean, ( or ) is called dispersion from the mean. Consequently, the variance can also be defined as the arithmetic mean of the squared deviations from the mean. From the computational point of view, it is easier and more efficient to use short-cut formulas to calculate the variance

VARIANCE: example 1 Refer to the data on 2002 total payrolls of 5 Major League Baseball (MLB) teams. MLB Team 2002 Total Payroll (millions of dollars) Anaheim Angels Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34

VARIANCE: example 1 We apply the short-cut formula, hence we need to compute the squares of observations x2. MLB Team x x² Anaheim Angels Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34 3844 8649 15,876 5625 1156 ∑x = 390 ∑x² = 35150

VARIANCE: example 2 The following data are the 2002 earnings (in thousands of dollars) before taxes for all 6 employees of a small company. 48.50 38.40 65.50 22.60 79.80 54.60 x x² 48.50 38.40 65.50 22.60 79.80 54.60 2352.25 1474.56 4290.25 510.76 6368.04 2981.16 ∑x = 309.40 ∑x² = 17977.02

VARIANCE: frequency distribution The formula for variance changes slightly if observations are grouped into a frequency table. Squared deviations are multiplied by each frequency's value, and then the total of these results is calculated. for a population for a sample The short-cut formulas become:

VARIANCE: example 3 Vehicles Owned (xi) Number of Households (ni) xi * ni xi2 xi2* ni 1 2 3 4 5 18 11 22 12 10 9 16 25 44 36 48 50 Sum 40 74 196

Variance: frequency distribution with classes Again, when the data set is organized in a frequency distribution with classes, we are approximating the data set by "rounding" each value in a given class to the class midpoint. Thus, the variance of a frequency distribution is given by Short-cut formulas for a population for a sample where mi is the midpoint of each class interval.

Variance:example 4 The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Number of Orders Number of Days n m m2 m*n m2 *n 10 – 12 13 – 15 16 – 18 19 – 21 4 12 20 14 11 17 121 196 289 400 44 168 340 280 484 2352 5780 5600 n= 50 ∑m*n = 832 ∑ m2 *n = 14216

STANDARD DEVIATION Definition The standard deviation is the positive square root of the variance. for a population for a sample

STANDARD DEVIATION The standard deviation is the most used measure of dispersion. The value of the standard deviation tells how closely the values of a data set are clustered around the mean. In general, a lower value of the standard deviation for a data set indicates that the values of that data set are spread over a relatively smaller range around the mean. In contrast, a large value of the standard deviation for a data set indicates that the values of that data set are spread over a relatively large range around the mean.

STANDARD DEVIATION: example 1 MLB Team 2002 Total Payroll (millions of dollars) x x² Anaheim Angels Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 62 93 126 75 34 3844 8649 15,876 5625 1156 ∑x = 390 ∑x² = 35150

STANDARD DEVIATION: example 2 Earnings (thousands of dollars) x x² 48.50 38.40 65.50 22.60 79.80 54.60 2352.25 1474.56 4290.25 510.76 6368.04 2981.16 ∑x = 309.40 ∑x² = 17977.02

Variance and Standard Deviation: observations The values of the variance and the standard deviation are never negative. That is, the numerator in the formula for the variance should never produce a negative value. Usually the values of the variance and standard deviation are positive, but if data set has no variation, then the variance and standard deviation are both zero. Example: 4 persons in a group are the same age – say 35 years. If we calculate the variance and the standard deviation, their values are zero.

CONTINGENCY TABLES AND ELEMENTS OF PROBABILITY

CONTINGENCY TABLES In many applications the interest is focused on the joint analysis of two variables (qualitative and/or quantitative) with the aim of evaluating the relation between them. The variables are usually presented as a contingency table (or two-way classification table). Whereas a frequency distribution provides the distribution of one variable, a contingency table describes the distribution of two or more variables simultaneously.

CONTINGENCY TABLES All 420 employees of a company were asked if they are smokers or nonsmokers and whether or not they are college graduates. Joint frequency of category “Smoker” of X and “Not a college Graduate” of Y College Graduate Not a College Graduate Smoker 35 80 Nonsmoker 130 175 Cell The table gives the distribution of 420 employees based on two variables or characters: X-smoke (yes or not) and Y-graduation (yes or not)

CONTINGENCY TABLES: marginal distributions Marginal distribution X College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420 Y X Marginal distribution Y Grand Total The right-hand column and the bottom row are called marginal distribution of X and marginal distribution of Y respectively.

CONTINGENCY TABLES Marginal distribution Y Marginal distribution X Total Smoker 115 Nonsmoker 305 420 Total College graduate 165 Not a College graduate 255 420 X Y

CONTINGENCY TABLES: conditional distributions Conditional distribution of X to the category “College Graduate” of Y Conditional distribution of Y to the category “Smoker” of X College Graduate Smoker 35 Nonsmoker 130 Total 165 Y Smoker College graduate 35 Not a College graduate 80 Total 115 X X Y NOTE

Definition of probability There are three different definitions of probability: classical definition of probability, frequentist definition of probability, subjective (Bayesian) definition of probability. Frequentist definition of probability: The relative frequency associated to a category of a variable (event) analyzed can be interpreted as an approximation of the probability associated to that event.

Definition of probability Example: Ten of the 500 randomly selected cars manufactured at a certain auto factory are found to be lemons. Assuming that the lemons are manufactured randomly, what is the probability that the next car manufactured at this auto factory is a lemon? Car (xi) ni Relative frequency (fi) Good Lemon 490 10 490/500 = .98 10/500 = .02 n = 500 Sum = 1.00 NOTE: The relative frequency is an approximation of the probability!! Relative frequencies and probabilities get closer as the number of cars increases.

Marginal Probability Coming back to the example of the 420 employees. Suppose that one employee is selected at random from the 420 employees. He may be classified on the basis of smoke alone or graduation. The employee can be “smoker”, “nonsmoker”, “graduate”, “nongraduate”. The probability of each characteristic is called marginal probability College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420

Marginal Probability Marginal (Simple) Probability: is the probability (relative frequency) computed on the marginal distributions: College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420

Joint Probability Suppose that one employees is selected at random from these 420. What is the probability that the employee is a smoker and a College graduate? College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420 It is written as P (Smoker  College Graduate). The symbol  is read as “and”.

Joint Probability Joint Probability: is the probability (relative frequency) computed on the joint distributions College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420

Conditional Probability Now suppose that one employees is selected at random from these 420. Assume that it is known that he is a Smoker. What is the probability that the employee selected is Graduate? College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420 It is written as P (Graduate|Smoker) It is read as “Probability that he is College Graduate given that he is a Smoker”

Conditional Probability Conditional Probability: is the probability (relative frequency) computed on the conditional distributions: College Graduate Not a College Graduate Total Smoker 35 80 115 Nonsmoker 130 175 305 165 255 420