Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.

Slides:



Advertisements
Similar presentations
Statistical Reasoning for everyday life
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Numerically Summarizing Data
Calculating & Reporting Healthcare Statistics
DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Slides by JOHN LOUCKS St. Edward’s University.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.
1.2: Describing Distributions
Variability Ibrahim Altubasi, PT, PhD The University of Jordan.
Initial Data Analysis Central Tendency.
Measures of Central Tendency
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Psy B07 Chapter 2Slide 1 DESCRIBING AND EXPLORING DATA.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Formula Compute a standard deviation with the Raw-Score Method Previously learned the deviation formula Good to see “what's going on” Raw score formula.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Lecture 3 A Brief Review of Some Important Statistical Concepts.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
© The McGraw-Hill Companies, Inc., Chapter 3 Data Description.
Measures of Variability James H. Steiger. Overview Discuss Common Measures of Variability Range Semi-Interquartile Range Variance Standard Deviation Derive.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
1.1 - Populations, Samples and Processes Pictorial and Tabular Methods in Descriptive Statistics Measures of Location Measures of Variability.
Descriptive Statistics: Numerical Methods
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
KNR 445 Statistics t-tests Slide 1 Variability Measures of dispersion or spread 1.
Describing distributions with numbers
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Statistics 11 The mean The arithmetic average: The “balance point” of the distribution: X=2 -3 X=6+1 X= An error or deviation is the distance from.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
The Robust Approach Dealing with real data. Estimating Population Parameters Four properties are considered desirable in a population estimator:  Sufficiency.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Numerical Measures of Variability
Introduction to Statistics Santosh Kumar Director (iCISA)
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Central Tendency & Dispersion
Measures of variability: understanding the complexity of natural phenomena.
Robust Estimators.
Summary Statistics: Measures of Location and Dispersion.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Introduction to statistics I Sophia King Rm. P24 HWB
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Variability. What Do We Mean by Variability?  Variability provides a quantitative measure of the degree to which scores in a distribution are spread.
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Variability Mechanics. The Average Deviation Another approach to estimating variance is to directly measure the degree to which individual data points.
Descriptive Statistics ( )
Descriptive Statistics (Part 2)
Numerical Descriptive Measures
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Presentation transcript:

Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values cluster around the center. This is known as variability.

Range There are various measures of variability, the most straightforward being the range of the sample: Highest value minus lowest value While range provides a good first pass at variance, it is not the best measure because of its sensitivity to extreme scores

While range provides a good first pass at variance, it is not the best measure because: It is calculated from only 2 points of data Those two values are the most extreme in the sample (obviously sensitive to outliers) Can change dramatically from sample to sample

Interquartile Range Range based on percentiles. Data can be ordered and then, much like we did for the median as the 50 th percentile also known as the second quartile or Q 2, we now look for numbers corresponding to the 25 th and 75 th percentile (Q 1 and Q 3 ). Q 3 - Q 1 gives us the interquartile range. Note that by only concerning oneself this middle 50% of scores, extreme scores will not affect this measure of variability. On the other hand, we also lose half our data in its calculation.

IQR and SIQR The semi-interquartile range is simply half the IQR –Represents the average spread of those scores falling in the quartile above and below the median If we had a scale whose median was 20 and SIQR of 5, we can say that the typical deviation of scores about the median does not extend more than 5 points above or below. However: what if our data is skewed?

IQR: Graphical Representation Box and whisker plots (Tukey) are graphical representations of IQR –“Hinges” mark the interquartile range –“Whiskers” encompassing 99% of the data –Any outliers designated in some fashion

The Average Deviation Another approach to estimating variance is to directly measure the degree to which individual data points differ from the mean and then average those deviations. That is:

The Average Deviation (cont.) However, if we try to do this with real data, the result will always be zero: Example: (2,3,3,4,4,6,6,12)

Average Deviation One way to get around the problem with the average deviation is to use the absolute value of the differences, instead of the differences themselves. The absolute value of some number is just the number without any sign: For Example: |-3| = 3

Average Deviation Thus, we could re-write and solve our average deviation question as follows: The data set in question has a mean of 5 and a mean absolute deviation of 2.25.

The Variance Although the MAD is an acceptable measure of variability, the most commonly used measure is variance (denoted s 2 for a sample and  2 for a population) and its square root termed the standard deviation (denoted s for a sample and  for a population).

The Variance (cont.) The computation of variance is also based on the basic notion of the average deviation however, instead of getting around the “zero problem” by using absolute deviations (as in MAD), the “zero problem” is eliminating by squaring the differences from the mean. Specifically:

Variance is not too meaningful on its own. What we’d like is something that’s on the same scale as the original variable. Standard deviation is just the square root of the variance, and gets our measure of variability back to the original scale units.

An equivalent formula that is easier to work with when calculating variances by hand is: Although this second formula may look more intimidating, a few examples will show you that it is actually easier to work with.

Relation of range to s If the variable is normally distributed, a rule of thumb is that: s = R/6 This will be more clear when we talk about the normal distribution and its properties, but the idea is that roughly 99% of the data falls between 3 SD above and below the mean.

Comparing variability The coefficient of variation allows us to compare variability on measures of different scales. Often we want something that has enough variability to accurately reflect the nature of the underlying variable. Suppose someone came up with a measure of say, motivation, that was measured with questionnaire items with Likert scales ranged from 1-4. Someone else comes up with a better one that utilizes a 7 point scale. We could use the CV to compare their relative spread.

Visualizing Means and Standard Deviations This demonstration allows you to play with the mean and standard deviation of a distribution. Note that changing the mean of the distribution simply moves the entire distribution to the left or right without changing its shape. In contrast, changing the standard deviation alters the spread of the data but does not affect where the distribution is “centered” Run demo

Your turn Find the mean, range and standard deviation of the following scores:

Your turn Mean = 5.78 Range = 8 s = 2.68

Estimating Population Parameters The mean (X) and variance (s 2 ) are the descriptive statistics that are most commonly used to represent the data points of some sample. The real reason that they are the preferred measures of central tendency and variance is because of certain properties they have as estimators of their corresponding population parameters; and  2.

Estimating Population Parameters (cont.) Four properties are considered desirable in a population estimator; sufficiency, unbiasedness, efficiency, & resistance. Both the mean and the variance are the best estimators in their class in terms of the first three of these four properties. To understand these properties, you first need to understand a concept in statistics called the sampling distribution

Sampling Distribution Demo We will discuss sampling distributions off and on throughout the course, and I only want to touch on the notion now. Basically, the idea is this – in order to exam the properties of a statistic we often want to take repeated samples from some population of data and calculate the relevant statistic on each sample. We can then look at the distribution of the statistic across these samples and ask a variety of questions about it. Check out this demonstrationthis demonstration

Properties of a Statistic 1) Sufficiency A sufficient statistic is one that makes use of all of the information in the sample to estimate its corresponding parameter. For example, this property makes the mean more attractive as a measure of central tendency compared to the mode or median.

Estimating Population Parameters 2) Unbiasedness A statistic is said to be an unbiased estimator if its expected value (i.e., the mean of a number of sample means) is equal to the population parameter it is estimating. Explanation of N-1 in s 2 formula.

Assessing the Bias of an Estimator Using the procedure, the mean can be shown to be an unbiased estimator. However, if the more intuitive formula for s 2 is used: it turns out to underestimate  2

Assessing the Bias of an Estimator (cont.) This bias to underestimate is caused by the act of sampling and it can be shown that this bias can be eliminated if N-1 is used in the denominator instead of N. Note that this is only true when calculating s 2, if you have a measurable population and you want to calculate  2, you use N in the denominator, not N-1.

Degrees of Freedom The mean of 6, 8, & 10 = 8. If I allow you to change as many of these numbers as you want BUT the mean must stay 8, how many of the numbers are you free to vary?

Degrees of Freedom The point of this exercise is that when the mean is fixed, it removes a degree of freedom from your sample -- this is like actually subtracting 1 from the number of observations in your sample. It is for exactly this reason that we use N- 1 in the denominator when we calculate s 2 (i.e., the calculation requires that the mean be fixed first which effectively removes -- fixes -- one of the data points).

Estimating Population Parameters 3) Efficiency The efficiency of a statistic is reflected in the variance that is observed when one examines the statistic over a bunch of independently chosen samples. The smaller the variance, the more efficient the statistic is said to be.

Estimating Population Parameters 4) Resistance The resistance of an estimator refers to the degree to which that estimate is effected by extreme values. As mentioned previously, both X and s 2 are highly sensitive to extreme values.

Estimating Population Parameters 4) Resistance Despite this, they are still the most commonly used estimates of the corresponding population parameters, mostly because of their superiority over other measures in terms sufficiency, unbiasedness, & efficiency.