MAT 446 Supplementary Note for Ch 1

Slides:



Advertisements
Similar presentations
CHAPTER 1 Exploring Data
Advertisements

CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
1.3: Describing Quantitative Data with Numbers
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Chapter 3 Looking at Data: Distributions Chapter Three
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Describing Distributions Numerically
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
SWBAT: Measure center with the mean and median and spread with interquartile range. Do Now:
Warmup Draw a stemplot Describe the distribution (SOCS)
CHAPTER 1 Exploring Data
1.3 Describing Quantitative Data with Numbers
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Measures of Center.
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Practice of Statistics, 4th edition STARNES, YATES, MOORE
Chapter 1: Exploring Data
Presentation transcript:

MAT 446 Supplementary Note for Ch 1 Myung Song, Ph.D. © 2009 W.H. Freeman and Company

Key Statistical Concepts Keller: Stats for Mgmt&Econ, 7th Ed. May 14, 2018 Key Statistical Concepts Population - a population is the group of all items of interest to a statistics practitioner. - frequently very large; sometimes infinite. E.g. All Florida voters Sample - A sample is a set of data drawn from the population. - Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day.

Key Statistical Concepts Parameter - A descriptive measure of a population. Statistic - A descriptive measure of a sample.

Key Statistical Concepts Population Sample Subset Statistic Parameter Populations have Parameters, Samples have Statistics.

Stem plot (stem-and-leaf) If your data set is the age of 12 persons: {9,9, 22, 32, 33, 39, 39,, 42, 49, 52, 58, 70} How to make a stem plot: A plot where each data value is split into a "leaf" (usually the last digit) and a "stem" (the other digits). For example "32" would be split into "3" (stem) and "2" (leaf). The "stem" values are listed down, and the "leaf" values are listed next to them. This way the "stem" groups the scores and each "leaf" indicates a score within that group. STEM LEAVES

Stem plot (stem-and-leaf) To compare two related distributions, a back-to-back stem plot with common stems is useful. Stem plots do not work well for large datasets. When the observed values have too many digits, trim the numbers before making a stem plot. When plotting a moderate number of observations, you can split each stem.

Histograms Example: Weight Data―Introductory Statistics Class

Histograms Example: Weight Data―Introductory Statistics Class

Describing Distributions A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right

Pie Charts and Bar Graphs US Solid Waste (2000) Material Weight (million tons) Percent of total Food scraps 25.9 11.2% Glass 12.8 5.5 % Metals 18.0 7.8 % Paper, paperboard 86.7 37.4 % Plastics 24.7 10.7 % Rubber, leather, textiles 15.8 6.8 % Wood 12.7 Yard trimmings 27.7 11.9 % Other 7.5 3.2 % Total 231.9 100.0 %

Pie Charts and Bar Graphs

Measuring Center: The Mean The most common measure of center is the arithmetic average, or mean. To find the mean (pronounced “x-bar”) of a set of observations, add their values and divide by the number of observations. If the n observations are x1, x2, x3, …, xn, their mean is: or in more compact notation

Measuring Center: The Median Because the mean cannot resist the influence of extreme observations, it is not a resistant measure of center. Another common measure of center is the median. The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: Arrange all observations from smallest to largest. If the number of observations n is odd, the median M is the center observation in the ordered list. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.

Measuring Center Use the data below to calculate the mean and median of the commuting times (in minutes) of 20 randomly selected New York workers. 0 5 1 005555 2 0005 3 00 4 005 5 6 005 7 8 5 Key: 4|5 represents a New York worker who reported a 45- minute travel time to work.

Comparing the Mean and Median The mean and median measure center in different ways, and both are useful. Comparing the Mean and the Median The mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than is the median.

Measuring Spread: Quartiles A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread. How to Calculate the Quartiles and the Interquartile Range To calculate the quartiles: Arrange the observations in increasing order and locate the median M. The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 – Q1

Five-Number Summary The minimum and maximum values alone tell us little about the distribution as a whole. Likewise, the median and quartiles tell us little about the tails of a distribution. To get a quick summary of both center and spread, combine all five numbers. The five-number summary of a distribution consists of the smallest observation, the first quartile(lower fourth), the median, the third quartile(upper fourth), and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum

Boxplots The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot. How to Make a Boxplot Draw and label a number line that includes the range of the distribution. Draw a central box from Q1 to Q3. Note the median M inside the box. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers.

Suspected Outliers: The 1.5  IQR Rule In addition to serving as a measure of spread, the interquartile range (IQR) or fourth spread is used as part of a rule of thumb for identifying outliers. The 1.5  IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5  IQR above the third quartile or below the first quartile. In the New York travel time data, we found Q1 = 15 minutes, Q3 = 42.5 minutes, and IQR = 27.5 minutes. For these data, 1.5  IQR = 1.5(27.5) = 41.25 Q1 – 1.5  IQR = 15 – 41.25 = –26.25 Q3+ 1.5  IQR = 42.5 + 41.25 = 83.75 Any travel time shorter than 26.25 minutes or longer than 83.75 minutes is considered an outlier. 0 5 1 005555 2 0005 3 00 4 005 5 6 005 7 8 5

This is an outlier by the Boxplots Consider our NY travel times data. Construct a boxplot. 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 Min=5 Q1 = 15 M = 22.5 Q3= 42.5 Max=85 This is an outlier by the 1.5 x IQR rule