1.2 Describing Distributions with Numbers. Center and spread are the most basic descriptions of what a data set “looks like.” They are intuitively meant.

Slides:



Advertisements
Similar presentations
Descriptive Measures MARE 250 Dr. Jason Turner.
Advertisements

Measures of Dispersion
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Measures of Central Tendency
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Numerical Descriptive Techniques
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Chapter 3: Averages and Variation Section 4: Percentiles and Box- and-Whisker Plots.
Table of Contents 1. Standard Deviation
Percentiles and Box – and – Whisker Plots Measures of central tendency show us the spread of data. Mean and standard deviation are useful with every day.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Three Averages and Variation.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Summary Statistics and Mean Absolute Deviation MM1D3a. Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data.
Numerical Measures. Measures of Central Tendency (Location) Measures of Non Central Location Measure of Variability (Dispersion, Spread) Measures of Shape.
Chapter 5 Describing Distributions Numerically.
Summary Statistics: Measures of Location and Dispersion.
Measures Of Central Tendency
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Lesson 25 Finding measures of central tendency and dispersion.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
CHAPTER 1 Exploring Data
a graphical presentation of the five-number summary of data
Describing Distributions Numerically
Unit Three Central Tendency.
Chapter 5 : Describing Distributions Numerically I
CHAPTER 2: Describing Distributions with Numbers
Chapter 6 ENGR 201: Statistics for Engineers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Unit 4 Statistics Review
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Measures of Central Tendency
Define the following words in your own definition
Box & Whiskers Plots AQR.
AP Statistics Day 4 Objective: The students will be able to describe distributions with numbers and create and interpret boxplots.
Chapter 1 Warm Up .
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

1.2 Describing Distributions with Numbers

Center and spread are the most basic descriptions of what a data set “looks like.” They are intuitively meant to measure exactly what comes to your mind when you hear those terms, but the best way to define them isn’t so obvious. We first investigate center.

Center What is the center? Good question! The most popular measure of center is the mean or average value of a data set. The mean of a data set is computed by summing all data values and then dividing by n. We denote the mean by x. The Greek letter Σ (Sigma) is used as shorthand to mean “sum up.”

Notation Suppose we are given the data values Compute Σx. We may also combine operations with x and the summation. Compute Σ(x+1). A nice formula for the mean may then be written as

Problems with mean Consider the data set Compute the mean of the data. We have that μ=10, which hardly seems like a good “measure of average.” The problem can be attributed to the value 70. A value which is considerably larger (or smaller) than the rest of the data pattern is known as an outlier. Any measure of central tendency that is “sensitive” to extreme values, such as above, is not a resistant measure.

A Resistant Measure The median for a collection of data values is the number that is exactly in the middle position of the list when the data are arraigned in increasing order. We use M to denote the median. Let’s find the mean in the example above.

Comparison The median is a resistant measure of central tendency, unlike the mean, which is a strong advantage. However, the mean takes into account all numerical values while the median only takes the existence of the values into account. There is a formula for the mean but only a location for the median.

Inadequacies with Centers What centers do not take into account is the spread of the data. For example, consider the following data sets: A={ } and B={ }. In both data sets, the mean and median are both 20, but the data of B is much more spread out than the data of A. Thus, using just a center to describe our data is not good enough. We also need spread.

Measuring spread The most obvious measure of spread is the range of a set of data; that is, the difference between the highest and lowest data values. The range of a data set is denoted R. Above, R(A)=21-19=2 and R(B)=39-1=38. But consider the data set C={ } and D={ }. Not only are the mean and median in both data sets the same, but so is the range.

Percentile The mth percentile is the number that separates the bottom m% of the data from the top (100-m)% of the data. It is denoted by P M. Note that the median of a data set is the 50 th percentile so that P 50 =median.

Quartiles and IQR We define the first, second, and third quartile by Q 1 =P 25, Q 2 =P 50, and Q 3 =P 75. The interquartile range, denoted IQR, is defined by IQR=Q 3 -Q 1. It is easy to convince yourself that IQR is a resistant measure!

The 5 Number Summary We study a way to present a data set in which the reader can easily read off the quartiles and the high and low values of the data set. The following is an example of a boxplot. Min, Q 1, M, Q 3, Max

E.g. Consider the data set {33, 36, 37, 37 38, 41, 42, 42, 42, 45, 47, 52, 54, 55, 56, 56, 57, 60, 78, 92}. Construct a boxplot. To identify outliers, we use a modified boxplot. The idea is that instead of drawing the whiskers from Q 1 to the lowest value and Q 3 to the highest value, we draw the upper whisker from Q 3 to the largest data value between Q 3 and Q xIQR. The lower whisker is drawn from Q 1 to the smallest data value between Q 1 – 1.5xIQR and Q 1. Any data that is not plotted thus far is considered an outlier and is plotted individually. Construct a modified boxplot for the above data.

Another measure of spread There are other things we can do. Let’s experiment with the data set on the board. We have just “discovered” the standard deviation; The standard deviation is denoted by s, and the variance is denoted by s 2. They are defined by

Problems and problems Compute the standard deviation for the data sets A and B. “[The standard deviation]… will be large if the observations are widely spread about their mean, and small if the observations are close to their mean.” Unfortunately, none of these measures of dispersion is resistant; that is, they are sensitive to outliers.